New Discussion

Notifications

You’re not receiving notifications from this thread.

Introduction to Importing from CSV Discussion

46
General

thanks

I had to come up with something similar at work a few years ago.

We have clients that need to mass create auction lots and send in Excel spreadsheets full of lot data. We needed a way to easily throw the XLS converted to CSV at the app and have it create the Lot objects. Supports blacklisting of columns that can be assigned from the CSV and typecasting column data.

This code is almost 3 years old and needs a serious refactor, but we don't use it enough to warrant sinking more time in to it right now.

https://gist.github.com/mgo...

(Using the end of the video to reference line numbers)

Why did you choose to use `user = User.create(...params...)` on L14, followed by `counter+=1 if user.persisted?` on L16 rather than `user = User.new(...params...)` and `counter+=1 if user.save`. I don't necessarily feel one is better than the other but I've always tended towards the latter. Is there an advantage one way or the other?

Great question. Either way works just the same in this case.

The only difference is that you have to worry about where the .save happens if you use that. With create, it doesn't really matter what order the next code is in. If you did save, you'd have to make sure it came first. Also, where the "save" happens is less clear if you have it on the right side of a line that has an "if" statement tacked on it.

In the end it doesn't make much difference at all because this code is very rough anyways. Ideally this gets refactored so that responsibilities are separated in a much cleaner fashion. More on that in a future episode though.

I'd love to see some information about how to generate some more robust dummy data as well perhaps using something like Faker (https://github.com/stympy/f...

Routine Tracker Routine Tracker

Hey @excid3:disqus. Has your caching just not caught up with the typo in the title? Seems like it should be "Introduction to Importing from CSV".

Ha! Thanks for catching that. I must have been tired.

Andrew Cockerham Andrew Cockerham

FYI The video doesn't show up in the browser so I had to download it...Chrome on Mac

Ah thanks for the heads up. That was my mistake.

Thanks for a great tutorial on CSV's it's just what I needed. As an aside, I was recently going through my gems and noticed FasterCSV as a dependency in a couple of them. Have you had a chance to play with it much?

I believe FasterCSV got rolled into Ruby 2.x at some point so you don't need to use it anymore.

Kohl Kohlbrenner Kohl Kohlbrenner

@excid3:disqus when you do email,first,last = row , how is each variable assigned? On each comma in the csv row?

Yeah, it's the items in the row in order. I just used a Ruby trick to split the array of items into separate variables.

irb(main):001:0> a, b, c = ["a", "b", "c"]
=> ["a", "b", "c"]
irb(main):002:0> a
=> "a"
irb(main):003:0> b
=> "b"
irb(main):004:0> c
=> "c"

A big difficulty is gathering the errors and viewing them in the resulting rails view...

Definitely. This is where form objects tend to make a lot more sense because it's that much more complicated. An intermediate step of uploading, processing, and finding errors before doing the actual import can be a good thing for this too.

Hey Chris, i'm setting up this feature for my app now. I've used the commentable episode to help me structure the app so that users can create a category and then add a startup(s) to that category. Essentially, startups in my app are startupables I'd like to be able to upload a csv of startups for a particular category. Any ideas?

I wrote a few rake tasks to import csv but i wonder if it is possible to write tests to make sure whether or not the data point in the right table and right cell...

Hey Chris,
I have added the functionality to import CSV files inside model. I want to rollback the transaction if any of the value is invalid in CSV file. How can i achieve that?

You should be able to just wrap it in a transaction block like so: http://api.rubyonrails.org/...

Hey Chris,

Just joined your site. Thanks for all the great videos. Was working through importing a CSV and I had a quick question regarding removing headers from the file.

In my application I had to use the following code for the loop:

CSV.foreach(filename, "r:ISO-8859-1") do |row|

When I tried adding the argument to ignore the header it didn't work. Is there something I'm missing?

The second argument there needs to be hash I'm pretty sure. You're passing in a string, so that's probably going to mismatch the arguments it expects. Try changing that to hash and that should fix it for you!

This is a savior. Thanks a lot

Hi Chris,

I'm stuck trying to import records from an excel spreadsheet.

Is there a way to convert my xls file into a csv file so I can follow along with your tutorial?

If not, please can you recommend a source of an excel importing tutorial.

In addition to trying to adapt your tutorial, I have tried each of these and can't get any of them to work:

http://www.mattmorgante.com...
https://richonrails.com/art...
https://deepakrip007.wordpr...

The problem seems to be one that other users here have encountered with their attempts. All of my attempts at solving this are outlined here: http://stackoverflow.com/qu...

Thanks
Mel

I have now found the roo, creek, spreadsheet and dullard gems. They talk about needing to convert xls files (which I have) to xlsx format. It all looks confusing. Maybe you could add another video to show how to change xls into csv so that these tutorials can be used to import data or you could do an overview of whichever of these gems is most suitable for importing data from xls files. I'm about to start trying with each of them now.

Hi Chris, I think I might have converted my xls file to a csv file and saved it to my root file. I've tried to follow this tutorial but keep getting errors when I try to run the rake command in the console. When you use the line be rake import:users, is the reference to 'users' a reference to the name of the model, the name of the task or something else? My task is called 'for_codes', my model is called Randd::Field. I've tried both in the console but keep getting errors that say: "NameError: undefined local variable or method `for_codes' for main:Object"

I've asked here to see if anyone else can see what's wrong: http://stackoverflow.com/qu...

Is anyone getting this error when trying to rake the task?

rake aborted!

ArgumentError: invalid byte sequence in UTF-8

So - it seems that saving my .xls file with a .csv extension inside my rails app (root level) didn't work to make the file eligible to be treated as a CSV file. I've now tried copying and pasting the content of my 2 column excel file into the .csv file saved in my rails app.

That pasted several hundred lines but without the | separating the cell content.

This is the first 3 lines (the first one is the header).

Is there a way to insert the pipes without having to go line by line to paste them in?

anz_referencetitle

010104pCombinatorics and Discrete Mathematics (excl. Physical Combinatorics)

010107pMathematical Logic, Set Theory, Lattices and Universal Algebra

Can anyone make sense of these instructions for selecting the first column of a long list of text? https://atom.io/packages/su...

I'm trying to manually add the "|" to separate the first and second column of the content of my xls file which is now pasted into a .csv file in my rails app. there are 1000s of lines.

I can't understand what it's telling me to do so that I can select the first column and then press paste to separate it from the second bit of content with the "|"

So - it seems that saving my .xls file with a .csv extension inside my rails app (root level) didn't work to make the file eligible to be treated as a CSV file. I've now tried copying and pasting the content of my 2 column excel file into the .csv file saved in my rails app.

That pasted several hundred lines but without the | separating the cell content.

This is the first 3 lines (the first one is the header).

Is there a way to insert the pipes without having to go line by line to paste them in?

anz_referencetitle

010104pCombinatorics and Discrete Mathematics (excl. Physical Combinatorics)

010107pMathematical Logic, Set Theory, Lattices and Universal Algebra

Actually - I tried the steps I described below with just a few lines, that I amended manually to insert a pipe between cell values.

When i try to rake the task I get an error that says:

bundle exec rake import:randd_fields --trace

** Invoke import:randd_fields (first_time)

** Invoke environment (first_time)

** Execute environment

** Execute import:randd_fields

nil

rake aborted!

NameError: undefined local variable or method `title' for main:Object

Has anyone been able to get this tutorial to work?

my client thought it was a simple feature, but I told him from the beginning it was not, now you confirm the fact. Thank you.

I am really stuck in trying to follow along with this. I moved my xls data into a google drive sheet and downloaded that as a csv saved in my root directory, but it still isn't working. Are there any other references to different tutorials that i can try. I tried those published by RichonRails and Matt Morgante but those don't work either. I tried the roo gem with the 2012 rails cast but that is out of date and I cant get that configured to work. Please can you go deeper into this so that the issue I keep running into is covered in your process. Thank you

Hello Chris.
I tried to run this code but i am getting this error, can you please help me.

https://uploads.disquscdn.c... https://uploads.disquscdn.c...

So @Chris how to you write Test for this solution?

Hey Chris -

You mentioned that User.create without the ! will allow an attempt to save to the db to fail silently if the record already exists, but isn't that only true if you've created a uniqueness constraint on the User model? That's the only way I could make it work. Using postgres.

Correct, you need validations for uniqueness so you don't get duplicates. Email is typically the field marked for uniqueness.

Hello Chris
Would you like to show me how we read the name of every sheet and all the object from one sheet . Because i am stuck hear. I always getting error like this
NoMethodError (undefined method `row' for "sheet_name":String):

looking for the furthur improvements.

I am also looking for the futhur information.
Thanks
Liteblue

Late to the party and all that but if anybody is looking to seed their project database from a CSV file, the is what I enter in my seed.rb file(s):

'''
CSV.read(Rails.root.join('lib', 'seeds', 'file_with_data_to_import.csv'), headers: true).each do |row|
t = ImportingModelName.create(row.to_hash)
puts "#{t.name}, #{t.some_other_attribute} saved"
end

// Optional part
puts "There are now #{ImportingModelName.count} rows in your ImportingModelName database table."
'''

The same can be used in a Rake Task file as per Chris’s video. It assumes that the CSV file has header row but, for a scheduled task, you could take care of those on the fly as per the video.

Performance-wise, it works rather well. For example, I had to seed a project database with a bit fewer than 25,000 records, which took about 2 minutes on my development machine. Each record had 40 columns, 20 of which had to be validated, and I had to modify the text case of the content in 6 of the columns.

Hope this helps someone.

Join the discussion
Create an account Log in

Want to stay up-to-date with Ruby on Rails?

Join 88,697+ developers who get early access to new tutorials, screencasts, articles, and more.

    We care about the protection of your data. Read our Privacy Policy.