从 CSV 导入 Ruby 数组，第一个字段作为哈希键，然后查找给定标题行的字段值

Question

提问by Marcos

Maybe somebody can help me.

也许有人可以帮助我。

Starting with a CSV file like so:

从像这样的 CSV 文件开始：

Ticker,"Price","Market Cap"
ZUMZ,30.00,933.90
XTEX,16.02,811.57
AAC,9.83,80.02

I manage to read them into an array:

我设法将它们读入数组：

require 'csv'
tickers = CSV.read("stocks.csv", {:headers => true, :return_headers => true, :header_converters => :symbol, :converters => :all} )

To verify data, this works:

为了验证数据，这有效：

puts tickers[1][:ticker]
ZUMZ

However this doesn't:

然而，这不会：

puts tickers[:ticker => "XTEX"][:price]

How would I go about turning this array into a hash using the ticker field as unique key, such that I could easily look up any other field associatively as defined in line 1 of the input? Dealing with many more columns and rows.

我将如何使用股票代码字段作为唯一键将此数组转换为散列，以便我可以轻松地关联地查找任何其他字段，如输入的第 1 行中定义的那样？处理更多的列和行。

Much appreciated!

非常感激！

Answer 1

采纳答案by Marcos

To get the best of both worlds (very fast reading from a huge file AND the benefits of a native Ruby CSV object) my code had since evolved into this method:

为了两全其美（从巨大的文件中快速读取以及原生 Ruby CSV 对象的好处），我的代码已经演变为这种方法：

$stock="XTEX"
csv_data = CSV.parse IO.read(%`|sed -n "1p; /^#{$stock},/p" stocks.csv`), {:headers => true, :return_headers => false, :header_converters => :symbol, :converters => :all}

# Now the 1-row CSV object is ready for use, eg:
$company = csv_data[:company][0]
$volatility_month = csv_data[:volatility_month][0].to_f
$sector = csv_data[:sector][0]
$industry = csv_data[:industry][0]
$rsi14d = csv_data[:relative_strength_index_14][0].to_f

which is closer to my original method, but only reads in one record plus line 1 of the input csv file containing the headers. The inline sedinstructions take care of that--and the whole thing is noticably instant. This this is better than lastbecause now I can access all the fields from Ruby, and associatively, not caring about column numbers anymore as was the case with awk.

这更接近我的原始方法，但只读取一条记录加上包含标题的输入 csv 文件的第 1 行。内联sed指令解决了这个问题——整个过程非常即时。这比上一个更好，因为现在我可以从 Ruby 访问所有字段，并且关联地，不再像awk.

Answer 2

回答by Michael Kohl

Like this (it works with other CSVs too, not just the one you specified):

像这样（它也适用于其他 CSV，而不仅仅是您指定的那个）：

require 'csv'

tickers = {}

CSV.foreach("stocks.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
  tickers[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end

Result:

结果：

{"ZUMZ"=>{:price=>30.0, :market_cap=>933.9}, "XTEX"=>{:price=>16.02, :market_cap=>811.57}, "AAC"=>{:price=>9.83, :market_cap=>80.02}}

You can access elements in this data structure like this:

您可以像这样访问此数据结构中的元素：

puts tickers["XTEX"][:price] #=> 16.02

Edit (according to comment): For selecting elements, you can do something like

编辑（根据评论）：要选择元素，您可以执行以下操作

 tickers.select { |ticker, vals| vals[:price] > 10.0 }

Answer 3

回答by Mr. Demetrius Michael

CSV.read(file_path, headers:true, header_converters: :symbol, converters: :all).collect do |row|
  Hash[row.collect { |c,r| [c,r] }]
end

Answer 4

回答by clouddra

To add on to Michael Kohl's answer, if you want to access the elements in the following manner

补充迈克尔科尔的回答，如果你想通过以下方式访问元素

puts tickers[:price]["XTEX"] #=> 16.02

You can try the following code snippet:

您可以尝试以下代码片段：

CSV.foreach("Workbook1.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
    hash_row =  row.headers[1..-1].zip( (Array.new(row.fields.length-1, row.fields[0]).zip(row.fields[1..-1])) ).to_h
    hash_row.each{|key, value| tickers[key] ? tickers[key].merge!([value].to_h) : tickers[key] = [value].to_h}
end

Answer 5

回答by Jesse Smith

Not as 1-liner-ie but this was more clear to me.

不是 1-liner-ie，但这对我来说更清楚。

csv_headers = CSV.parse(STDIN.gets)
csv = CSV.new(STDIN)

kick_list = []
csv.each_with_index do |row, i|
  row_hash = {}
  row.each_with_index do |field, j|
    row_hash[csv_headers[0][j]] = field
  end
  kick_list << row_hash
end

Answer 6

回答by Marcos

While this isn't a 100% native Ruby solution to the original question, should others stumble here and wonder what awk call I wound up using for now, here it is:

虽然这不是原始问题的 100% 原生 Ruby 解决方案，但如果其他人在这里绊倒并想知道我现在最终使用的是什么 awk 调用，这里是：

$dividend_yield = IO.readlines("|awk -F, '==\"#{$stock}\" {print }' datafile.csv")[0].to_f

where $stock is the variable I had previously assigned to a company's ticker symbol (the wannabe key field). Conveniently survives problems by returning 0.0 if: ticker or file or field #9 not found/empty, or if value cannot be typecasted to a float. So any trailing '%' in my case gets nicely truncated.

其中 $stock 是我之前分配给公司股票代码（想要的关键字段）的变量。在以下情况下通过返回 0.0 方便地解决问题：股票代码或文件或字段 #9 未找到/为空，或者值不能被类型转换为浮点数。因此，在我的情况下，任何尾随 '%' 都会被很好地截断。

Note that at this point one could easily add more filters within awk to have IO.readlines return a 1-dim array of output lines from the smaller resulting CSV, eg.

请注意，此时可以轻松地在 awk 中添加更多过滤器，以使 IO.readlines 从较小的结果 CSV 返回一个 1-dim 输出行数组，例如。

 awk -F, ' >= 2.01  &&   > 99.99  {print ##代码##}' datafile.csv

outputs in bash which lines have a DivYld (col 9) over 2.01 and price (col 2) over 99.99. (Unfortunately I'm not using the header row to to determine field numbers, which is where I was ultimately hoping for some searchable associative Ruby array.)

在 bash 中输出哪些行的 DivYld (col 9) 超过 2.01，而 price (col 2) 超过 99.99。（不幸的是，我没有使用标题行来确定字段编号，这是我最终希望获得一些可搜索的关联 Ruby 数组的地方。）

从 CSV 导入 Ruby 数组，第一个字段作为哈希键，然后查找给定标题行的字段值

提问by Marcos

采纳答案by Marcos

回答by Michael Kohl

回答by Mr. Demetrius Michael

回答by clouddra

回答by Jesse Smith

回答by Marcos

相关推荐

最近更新

标签

从 CSV 导入 Ruby 数组，第一个字段作为哈希键，然后查找给定标题行的字段值

提问by Marcos

采纳答案by Marcos

回答by Michael Kohl

回答by Mr. Demetrius Michael

回答by clouddra

回答by Jesse Smith

回答by Marcos

相关推荐

ruby 如何将散列保存到 CSV 中

reactjs 找不到模块：无法解析'react-native' - React Native

为什么在 Ruby 中使用符号作为哈希键？

reactjs React Router v4 在表单提交时重定向

相关推荐

最近更新

标签