Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Everlasting query if column contains missing values #257

Closed
davidbp opened this issue Apr 3, 2019 · 2 comments
Closed

Everlasting query if column contains missing values #257

davidbp opened this issue Apr 3, 2019 · 2 comments

Comments

@davidbp
Copy link

davidbp commented Apr 3, 2019

Hello,

If we want to filter a dataframe using a column containing missing values we can get an everlasting query.

Let df be a dataframe. If column :x1 from df has missing values then

df[df[:x1].== "5",:]
# ERROR: ArgumentError: unable to check bounds for indices of type Missing

Nevertheless

result_query_everlasting = df |> @filter( _.x1 == "5") |> DataFrame   

takes forever (15 minutes computing until I killed it). Maybe it should return an error in the same way df[df[:x1].== "5",:] returns an error.

I have observed that the following code works

result_query = df |> @filter( isequal(_.x1 , "5") |> DataFrame   # very fast

Code to construct dataframe and test it


using Query, DataFrames, CSV

# building dummy data
n_rows = 50_000
n_cols = 20
df = DataFrame(rand(Int16,n_rows,n_cols ))
df[:x1] = Array{Union{Missing, String}}(repr.(rand(1:10,n_rows)))
df[:x1][df[:x1].=="1"] .= missing

# hangs forever (it might be better to simply return an error)
result_query_everlasting = df |> @filter( _.x1 == "5") |> DataFrame   

# Works
 # result_filter = filter(row-> isequal(row.x1, "5"), df)

# ERROR: ArgumentError: unable to check bounds for indices of type Missing
# result_breaks = df[df[:x1].== "5",:]

# Works 
# result_normal = df[isequal.(df[:x1], "5"),:]

# works
# result_query = df |> @filter( isequal(_.x1 , "5") |> DataFrame   # very fast
@davidanthoff
Copy link
Member

Hm, very strange, that query is near instantaneous on my system... What version of julia and all the packages do you use?

@davidbp
Copy link
Author

davidbp commented Apr 4, 2019

We can close this.
I think I tried it in juliapro version 1.03 but I don't have it installed anymore. In julia 1.1 it works well in both macos and Ubuntu.

@davidbp davidbp closed this as completed Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants