Ошибка input contains nan infinity or a value too large for dtype float64 - Ремонт и установка крупной бытовой техники

Remove all infinite values:

(and replace with min or max for that column)

import numpy as np

# generate example matrix
matrix = np.random.rand(5,5)
matrix[0,:] = np.inf
matrix[2,:] = -np.inf
>>> matrix
array([[       inf,        inf,        inf,        inf,        inf],
       [0.87362809, 0.28321499, 0.7427659 , 0.37570528, 0.35783064],
       [      -inf,       -inf,       -inf,       -inf,       -inf],
       [0.72877665, 0.06580068, 0.95222639, 0.00833664, 0.68779902],
       [0.90272002, 0.37357483, 0.92952479, 0.072105  , 0.20837798]])

# find min and max values for each column, ignoring nan, -inf, and inf
mins = [np.nanmin(matrix[:, i][matrix[:, i] != -np.inf]) for i in range(matrix.shape[1])]
maxs = [np.nanmax(matrix[:, i][matrix[:, i] != np.inf]) for i in range(matrix.shape[1])]

# go through matrix one column at a time and replace  + and -infinity 
# with the max or min for that column
for i in range(matrix.shape[1]):
    matrix[:, i][matrix[:, i] == -np.inf] = mins[i]
    matrix[:, i][matrix[:, i] == np.inf] = maxs[i]

>>> matrix
array([[0.90272002, 0.37357483, 0.95222639, 0.37570528, 0.68779902],
       [0.87362809, 0.28321499, 0.7427659 , 0.37570528, 0.35783064],
       [0.72877665, 0.06580068, 0.7427659 , 0.00833664, 0.20837798],
       [0.72877665, 0.06580068, 0.95222639, 0.00833664, 0.68779902],
       [0.90272002, 0.37357483, 0.92952479, 0.072105  , 0.20837798]])

Источник

When using a dataset for analysis, you must check your data to ensure it only contains finite numbers and no NaN values (Not a Number). If you try to pass a dataset that contains NaN or infinity values to a function for analysis, you will raise the error: ValueError: input contains nan, infinity or a value too large for dtype(‘float64’).

To solve this error, you can check your data set for NaN values using numpy.isnan() and infinite values using numpy.isfinite(). You can replace NaN values using nan_to_num() if your data is in a numpy array or SciKit-Learn’s SimpleImputer.

This tutorial will go through the error in detail and how to solve it with the help of code examples.

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)
- What is a ValueError?
- What is a NaN in Python?
- What is inf in Python?
Example #1: Dataset with NaN Values
- Solution #1: using nan_to_num()
- Solution #2: using SimpleImputer
Example #2: Dataset with NaN and inf Values
- Solution #1: Using nan_to_num
- Solution #2: Using fillna()
- Solution #3: using SimpleImputer
Summary

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)

What is a ValueError?

In Python, a value is the information stored within a particular object. You will encounter a ValueError in Python when you use a built-in operation or function that receives an argument with the right type but an inappropriate value.

What is a NaN in Python?

In Python, a NaN stands for Not a Number and represents undefined entries and missing values in a dataset.

What is inf in Python?

Infinity in Python is a number that is greater than every other numeric value and can either be positive or negative. All arithmetic operations performed on an infinite value will produce an infinite number. Infinity is a float value; there is no way to represent infinity as an integer. We can use float() to represent infinity as follows:

pos_inf=float('inf')

neg_inf=-float('inf')

print('Positive infinity: ', pos_inf)

print('Negative infinity: ', neg_inf)

Positive infinity:  inf
Negative infinity:  -inf

We can also use the math, decimal, sympy, and numpy modules to represent infinity in Python.

Let’s look at some examples where we want to clean our data of NaN and infinity values.

Example #1: Dataset with NaN Values

In this example, we will generate a dataset consisting of random numbers and then randomly populate the dataset with NaN values. We will try to cluster the values in the dataset using the AffinityPropagation in the Scikit-Learn library.

Note: The use of the AffinityPropagation to cluster on random data is just an example to demonstrate the source of the error. The function you are trying to use may be completely different to AffinityPropagation, but the data preprocessing described in this tutorial will still apply.

The data generation looks as follows:

# Import numpy and AffinityPropagation

import numpy as np

from sklearn.cluster import AffinityPropagation

# Number of NaN values to put into data

n = 4

data = np.random.randn(20)

# Get random indices in the data

index_nan = np.random.choice(data.size, n, replace=False)

# Replace data with NaN

data.ravel()[index_nan]=np.nan

print(data)

Let’s look at the data:

[-0.0063374  -0.974195    0.94467842  0.38736788  0.84908087         nan
  1.00582645         nan  1.87585201 -0.98264992 -1.64822932  1.24843544
  0.88220504 -1.4204208   0.53238027         nan  0.83446561         nan
 -0.04655628 -1.09054183]

The data consists of twenty random values, four of which are NaN, and the rest are numerical values. Let’s try to fit the data using the AffinityPropagation() class.

af= AffinityPropagation(random_state=5).fit([data])

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

We raise the error because the AffinityPropagation.fit() cannot handle NaN, infinity or extremely large values. Our data contains NaN values, and we need to preprocess the data to replace them with suitable values.

Solution #1: using nan_to_num()

To check if a dataset contains NaN values, we can use the isnan() function from NumPy. If we pair this function with any(), we will check if there are any instances of NaN. We can replace the NaN values using the nan_to_num() method. Let’s look at the code and the clean data:

print(np.any(np.isnan(data)))

data = np.nan_to_num(data)

print(data)

True
[-0.0063374  -0.974195    0.94467842  0.38736788  0.84908087  0.
  1.00582645  0.          1.87585201 -0.98264992 -1.64822932  1.24843544
  0.88220504 -1.4204208   0.53238027  0.          0.83446561  0.
 -0.04655628 -1.09054183]

The np.any() part of the code returns True because our dataset contains at least one NaN value. The clean data has zeros in place of the NaN values. Let’s fit on the clean data:

af= AffinityPropagation(random_state=5).fit([data])

This code will execute without any errors.

Solution #2: using SimpleImputer

Scikit-Learn provides a class for imputation called SimpleImputer. We can use the SimpleImputer to replace NaN values. To replace NaN values in a one-dimensional dataset, we need to set the strategy parameter in the SimpleImputer to constant. First, we will generate the data:

import numpy as np

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

print(data)

The data looks like this:

[ 1.4325319   0.61439789  0.3614522   1.38531346         nan  0.6900916
  0.50743745  0.48544145         nan         nan  0.17253557         nan
 -1.05027802  0.09648188  1.15971533  0.29005307  2.35040023  0.44103513
 -0.03235852 -0.78142219]

We can use the SimpleImputer class to fit and transform the data as follows:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

The clean data looks like this:

[[ 1.4325319   0.61439789  0.3614522   1.38531346  0.          0.6900916
   0.50743745  0.48544145  0.          0.          0.17253557  0.
  -1.05027802  0.09648188  1.15971533  0.29005307  2.35040023  0.44103513
  -0.03235852 -0.78142219]]

And we can pass the clean data to the AffinityPropagation clustering method as follows:

af= AffinityPropagation(random_state=5).fit(data)

We can also use the SimpleImputer class on multi-dimensional data to replace NaN values using the mean along each column. We have to set the imputation strategy to “mean”, and using the mean is only valid for numeric data. Let’s look at an example of a 3×3 nested list that contains NaN values:

from sklearn.impute import SimpleImputer

data = [[7, 2, np.nan], 
        [4, np.nan, 6], 
        [10, 5, 9]]

We can replace the NaN values as follows:

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit(data)

data = imp_mean.transform(data)

print(data)

[[ 7.   2.   7.5]
 [ 4.   3.5  6. ]
 [10.   5.   9. ]]

We replaced the np.nan values with the mean of the real numbers along the columns of the nested list. For example, in the third column, the real numbers are 6 and 9, so the mean is 7.5, which replaces the np.nan value in the third column.

We can use the other imputation strategies media and most_frequent.

Example #2: Dataset with NaN and inf Values

This example will generate a dataset consisting of random numbers and then randomly populate the dataset with NaN and infinity values. We will try to cluster the values in the dataset using the AffinityPropagation in the Scikit-Learn library. The data generation looks as follows:

import numpy as np

from sklearn.cluster import AffinityPropagation

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data)

[-0.76148741         inf  0.10339756         nan         inf -0.75013509
  1.2740893          nan -1.68682986         nan  0.57540185 -2.0435754
  0.99287213         inf  0.5838198          inf -0.62896815 -0.45368201
  0.49864775 -1.08881703]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. Let’s try to fit the data using the AffinityPropagation() class.

af= AffinityPropagation(random_state=5).fit([data])

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

We raise the error because the dataset contains NaN values and infinity values.

Solution #1: Using nan_to_num

To check if a dataset contains NaN values, we can use the isnan() function from NumPy. If we pair this function with any(), we will check if there are any instances of NaN.

To check if a dataset contains infinite values, we can use the isfinite() function from NumPy. If we pair this function with any(), we will check if there are any instances of infinity.

We can replace the NaN and infinity values using the nan_to_num() method. The method will set NaN values to zero and infinity values to a very large number. Let’s look at the code and the clean data:

print(np.any(np.isnan(data)))

print(np.all(np.isfinite(data)))

data = np.nan_to_num(data)

print(data)

True

False

[-7.61487414e-001  1.79769313e+308  1.03397556e-001  0.00000000e+000
  1.79769313e+308 -7.50135085e-001  1.27408930e+000  0.00000000e+000
 -1.68682986e+000  0.00000000e+000  5.75401847e-001 -2.04357540e+000
  9.92872128e-001  1.79769313e+308  5.83819800e-001  1.79769313e+308
 -6.28968155e-001 -4.53682014e-001  4.98647752e-001 -1.08881703e+000]

We replaced the NaN values with zeroes and the infinity values with 1.79769313e+308. We can fit on the clean data as follows:

af= AffinityPropagation(random_state=5).fit([data])

This code will execute without any errors. If we do not want to replace infinity with a very large number but with zero, we can convert the infinity values to NaN using:

data[data==np.inf] = np.nan

And then pass the data to the nan_to_num method, converting all the NaN values to zeroes.

Solution #2: Using fillna()

We can use Pandas to convert our dataset to a DataFrame and replace the NaN and infinity values using the Pandas fillna() method. First, let’s look at the data generation:

import numpy as np

import pandas as pd

from sklearn.cluster import AffinityPropagation

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data

[ 0.41339801         inf         nan  0.7854321   0.23319745         nan
  0.50342482         inf -0.82102161 -0.81934623  0.23176869 -0.61882322
  0.12434801 -0.21218049         inf -1.54067848         nan  1.78086445
         inf  0.4881174 ]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. We can convert the numpy array to a DataFrame as follows:

df = pd.DataFrame(data)

Once we have the DataFrame, we can use the replace method to replace the infinity values with NaN values. Then, we will call the fillna() method to replace all NaN values in the DataFrame.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

df = df.fillna(0)

We can use the to_numpy() method to convert the DataFrame back to a numpy array as follows:

data = df.to_numpy()

print(data)

[[ 0.41339801]
 [ 0.        ]
 [ 0.        ]
 [ 0.7854321 ]
 [ 0.23319745]
 [ 0.        ]
 [ 0.50342482]
 [ 0.        ]
 [-0.82102161]
 [-0.81934623]
 [ 0.23176869]
 [-0.61882322]
 [ 0.12434801]
 [-0.21218049]
 [ 0.        ]
 [-1.54067848]
 [ 0.        ]
 [ 1.78086445]
 [ 0.        ]
 [ 0.4881174 ]]

We can now fit on the clean data using the AffinityPropagation class as follows:

af= AffinityPropagation(random_state=5).fit(data)

print(af.cluster_centers_)

The clustering algorithm gives us the following cluster centres:

[[ 0.        ]
 [ 0.50342482]
 [-0.81934623]
 [-1.54067848]
 [ 1.78086445]]

We can also use Pandas to drop columns with NaN values using the dropna() method. For further reading on using Pandas for data preprocessing, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Solution #3: using SimpleImputer

Let’s look at an example of using the SimpleImputer to replace NaN and infinity values. First, we will look at the data generation:

import numpy as np

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data)

[-0.5318616          nan  0.12842066         inf         inf         nan
  1.24679674  0.09636847  0.67969774  1.2029146          nan  0.60090616
 -0.46642723         nan  1.58596659  0.47893738  1.52861316         inf
 -1.36273437         inf]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. Let’s try to use the SimpleImputer to clean our data:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

ValueError: Input contains infinity or a value too large for dtype('float64').

We raise the error because the SimpleImputer method does not support infinite values. To solve this error, you can replace the np.inf with np.nan values as follows:

data[data==np.inf] = np.nan

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

With all infinity values replaced with NaN values, we can use the SimpleImputer to transform the data. Let’s look at the clean dataset:

[[-0.5318616   0.          0.12842066  0.          0.          0.
   1.24679674  0.09636847  0.67969774  1.2029146   0.          0.60090616
  -0.46642723  0.          1.58596659  0.47893738  1.52861316  0.
  -1.36273437  0.        ]]

Consider the case where we have multi-dimensional data with NaN and infinity values, and we want to use the SimpleImputer method. In that case, we can replace the infinite by using the Pandas replace() method as follows:

from sklearn.impute import SimpleImputer

data = [[7, 2, np.nan], 
        [4, np.nan, 6], 
        [10, 5, np.inf]]

df = pd.DataFrame(data)

df.replace([np.inf, -np.inf], np.nan, inplace=True)

data = df.to_numpy()

Then we can use the SimpleImputer to fit and transform the data. In this case, we will replace the missing values with the mean along the column where each NaN value occurs.

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit(data)

data = imp_mean.transform(data)

print(data)

The clean data looks like this:

[[ 7.   2.   6. ]
 [ 4.   3.5  6. ]
 [10.   5.   6. ]]

Summary

Congratulations on reading to the end of this tutorial! If you pass a NaN or an infinite value to a function, you may raise the error: ValueError: input contains nan, infinity or a value too large for dtype(‘float64’). This commonly occurs as a result of not preprocessing data before analysis. To solve this error, check your data for NaN and inf values and either remove them or replace them with real numbers.

You can only replace NaN values with the SimpleImputer method. If you try to replace infinity values with the SimpleImputer, you will raise the ValueError. Ensure that you convert all positive and negative infinity values to NaN before using the SimpleImputer.

For further reading on ValueErrors, go to the article: How to Solve Python ValueError: I/O operation on closed file.

or further reading on Scikit-learn, go to the article: How to Solve Sklearn ValueError: Unknown label type: ‘continuous’.

Go to the online courses page on Python to learn more about coding in Python for data science and machine learning.

Have fun and happy researching!

Источник

So I’m trying to write a piece of code that can predict the «pr10tournaments» from the csv data. I am running into an error that says

ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

Here is the code

from os import sep
import sklearn
import tensorflow
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import pickle
from matplotlib import style
from sklearn import linear_model
from sklearn.utils import shuffle 



data = pd.read_csv("pr.csv", sep=";")

data = data[["pr10tournaments", "pr10tournamentsv2", "pr10tournamentsv3", "pr10tournamentsv4", "pr10tournamentsv5"]]



predict = "pr10tournaments"

x = np.array(data.drop([predict], 1))
y = np.array(data[predict])



x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)

linear = linear_model.LinearRegression()

linear.fit(x_train, y_train)
acc = linear.score(x_test,y_test)
print(acc)

Here is my csv file

playername;pr10tournaments;pr10tournamentsv2;pr10tournamentsv3;pr10tournamentsv4;pr10tournamentsv5;
"REET";"11410";"4680";"18482";"2345";7175
"Cented";"16225";"8122";"16445";"12740";5897
"Deyy";"10995";"9187";"21375";"6180";13862
"Edgey";"22150";"7087";"17612";5792

I am new to machine learning and I’m not certain but believe the error is with the numbers being too large and if that is the case is there some sort of way around this? Thanks.

Источник

One common error you may encounter when using Python is:

ValueError: Input contains infinity or a value too large for dtype('float64').

This error usually occurs when you attempt to use some function from the scikit-learn module, but the DataFrame or matrix you’re using as input has NaN values or infinite values.

The following example shows how to resolve this error in practice.

How to Reproduce the Error

Suppose we have the following pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'x1': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4],
                   'x2': [1, 3, 3, 5, 2, 2, 1, np.inf, 0, 3, 4],
                   'y': [np.nan, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90]})

#view DataFrame
print(df)

    x1   x2     y
0    1  1.0   NaN
1    2  3.0  78.0
2    2  3.0  85.0
3    4  5.0  88.0
4    2  2.0  72.0
5    1  2.0  69.0
6    5  1.0  94.0
7    4  inf  94.0
8    2  0.0  88.0
9    4  3.0  92.0
10   4  4.0  90.0

Now suppose we attempt to fit a multiple linear regression model using functions from scikit-learn:

from sklearn.linear_model import LinearRegression

#initiate linear regression model
model = LinearRegression()

#define predictor and response variables
X, y = df[['x1', 'x2']], df.y

#fit regression model
model.fit(X, y)

#print model intercept and coefficients
print(model.intercept_, model.coef_)

ValueError: Input contains infinity or a value too large for dtype('float64').

We receive an error since the DataFrame we’re using has both infinite and NaN values.

How to Fix the Error

The way to resolve this error is to first remove any rows from the DataFrame that contain infinite or NaN values:

#remove rows with any values that are not finite
df_new = df[np.isfinite(df).all(1)]

#view updated DataFrame
print(df_new)

    x1   x2     y
1    2  3.0  78.0
2    2  3.0  85.0
3    4  5.0  88.0
4    2  2.0  72.0
5    1  2.0  69.0
6    5  1.0  94.0
8    2  0.0  88.0
9    4  3.0  92.0
10   4  4.0  90.0

The two rows that had infinite or NaN values have been removed.

We can now proceed to fit our linear regression model:

from sklearn.linear_model import LinearRegression

#initiate linear regression model
model = LinearRegression()

#define predictor and response variables
X, y = df_new[['x1', 'x2']], df_new.y

#fit regression model
model.fit(X, y)

#print model intercept and coefficients
print(model.intercept_, model.coef_)

69.85144124168515 [ 5.72727273 -0.93791574]

Notice that we don’t receive any error this time because we first removed the rows with infinite or NaN values from the DataFrame.

Additional Resources

The following tutorials explain how to fix other common errors in Python:

How to Fix in Python: ‘numpy.ndarray’ object is not callable
How to Fix: TypeError: ‘numpy.float64’ object is not callable
How to Fix: Typeerror: expected string or bytes-like object

Источник

In this article I will provide you code examples in Python to resolve valueerror: input contains nan, infinity or a value too large for dtype(‘float64’). As indicated by the error, it occurs when data contains NaN or infinity. Such data can’t be processed because they have no definite bounds.

Error Code – Let’s first replicate the error –

matrix = np.random.rand(5,5)
matrix[0,:] = np.inf
matrix[2,:] = -np.inf

print(matrix)

# Output:
array([[       inf,        inf,        inf,        inf,        inf],
       [0.87362809, 0.28321499, 0.7427659 , 0.37570528, 0.35783064],
       [      -inf,       -inf,       -inf,       -inf,       -inf],
       [0.72877665, 0.06580068, 0.95222639, 0.00833664, 0.68779902],
       [0.90272002, 0.37357483, 0.92952479, 0.072105  , 0.20837798]])

This matrix has infinite numbers. If you perform some operations like in sklearn, you will get this error –

valueerror: input contains nan, infinity or a value too large for dtype('float64')

Solutions

The obvious solution is to check for NaN and infinity in your matrix and replace those values with something meaningful and workable.

Method 1 – Check NaN & infinity using np.any() & np.all()

np.any(np.isnan(matrix))
np.all(np.isfinite(matrix))

Method 2 – For dataframes, use this function for cleaning –

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

Method 3 – Reset index of dataframe –

df = df.reset_index()

But, this method will add an index to the dataframe.

Method 4 – Replace NaN & infinite with some value –

df.replace([np.inf, -np.inf], np.nan, inplace=True)

The above code will replace all infinite values with NaN. Next, we will replace NaN with some number –

df.fillna(999, inplace=True)

Method 5 – Using numpy nan_to_num() function –

df = np.nan_to_num(df)

Method 6 – For X_train –

X_train = X_train.replace((np.inf, -np.inf, np.nan), 0).reset_index(drop=True)

Method 7 – Detect all NaN and infinite in your data –

index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

This will print all the values which are not finite including NaN and infinite.

Method 8 – Dropping all NaN & infinite –

df = df.replace([np.inf, -np.inf], np.nan)
df = df.dropna()
df = df.reset_index()

Method 9 – Replace NaN & infinite with max float64 –

inputArray[inputArray == inf] = np.finfo(np.float64).max

This is Akash Mittal, an overall computer scientist. He is in software development from more than 10 years and worked on technologies like ReactJS, React Native, Php, JS, Golang, Java, Android etc. Being a die hard animal lover is the only trait, he is proud of.

Related Tags

Error,
python error,
python-short

Источник

Remove all infinite values:

(and replace with min or max for that column)

Table of contents

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)

What is a ValueError?

What is a NaN in Python?

What is inf in Python?

Example #1: Dataset with NaN Values

Solution #1: using nan_to_num()

Solution #2: using SimpleImputer

Example #2: Dataset with NaN and inf Values

Solution #1: Using nan_to_num

Solution #2: Using fillna()

Solution #3: using SimpleImputer

Summary

How to Reproduce the Error

How to Fix the Error

Additional Resources

Solutions

Related Tags