Ошибка на строке nan питон - Ремонт и установка крупной бытовой техники

import pandas as pd
data = {'kol_click':[1, 8, 4, 2, 1, '', 18, '', 3, 10]}
df1 = pd.DataFrame(data)
df1['kol_click_1'] = df1['kol_click'].str.replace('', '0')
print(df1.head(10))

  kol_click kol_click_1
0         1         NaN
1         8         NaN
2         4         NaN
3         2         NaN
4         1         NaN
5                     0
6        18         NaN
7                     0
8         3         NaN
9        10         NaN

Почему числа поменялись на NaN ? Как заменить пустые строки на нулевые значения?

задан 3 мар 2020 в 13:23

Проблема вызвана смешением целых чисел и строк в одном столбце. Pandas воспринимает тип такого столбеца как object:

In [17]: df1.dtypes
Out[17]:
kol_click    object
dtype: object

Но комфортно работать c таким столбцом как с обычным строковым столбцом не получится:

In [25]: df1['kol_click'].str[:10]
Out[25]:
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5
6    NaN
7
8    NaN
9    NaN
Name: kol_click, dtype: object

In [26]: df1['kol_click'].astype(str).str[:10]
Out[26]:
0     1
1     8
2     4
3     2
4     1
5
6    18
7
8     3
9    10
Name: kol_click, dtype: object

Решение — попробуйте так:

In [22]: df1['kol_click_1'] = pd.to_numeric(df1['kol_click'], errors='coerce').fillna(0)

In [23]: df1
Out[23]:
  kol_click  kol_click_1
0         1          1.0
1         8          8.0
2         4          4.0
3         2          2.0
4         1          1.0
5                    0.0
6        18         18.0
7                    0.0
8         3          3.0
9        10         10.0

In [24]: df1.dtypes
Out[24]:
kol_click       object
kol_click_1    float64
dtype: object

0xdb

51.5k195 золотых знаков56 серебряных знаков233 бронзовых знака

ответ дан 3 мар 2020 в 13:36

Источник

import pandas as pd
data = {'kol_click':[1, 8, 4, 2, 1, '', 18, '', 3, 10]}
df1 = pd.DataFrame(data)
df1['kol_click_1'] = df1['kol_click'].str.replace('', '0')
print(df1.head(10))

  kol_click kol_click_1
0         1         NaN
1         8         NaN
2         4         NaN
3         2         NaN
4         1         NaN
5                     0
6        18         NaN
7                     0
8         3         NaN
9        10         NaN

Почему числа поменялись на NaN ? Как заменить пустые строки на нулевые значения?

задан 3 мар 2020 в 13:23

In [17]: df1.dtypes
Out[17]:
kol_click    object
dtype: object

Но комфортно работать c таким столбцом как с обычным строковым столбцом не получится:

In [25]: df1['kol_click'].str[:10]
Out[25]:
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5
6    NaN
7
8    NaN
9    NaN
Name: kol_click, dtype: object

In [26]: df1['kol_click'].astype(str).str[:10]
Out[26]:
0     1
1     8
2     4
3     2
4     1
5
6    18
7
8     3
9    10
Name: kol_click, dtype: object

Решение — попробуйте так:

In [22]: df1['kol_click_1'] = pd.to_numeric(df1['kol_click'], errors='coerce').fillna(0)

In [23]: df1
Out[23]:
  kol_click  kol_click_1
0         1          1.0
1         8          8.0
2         4          4.0
3         2          2.0
4         1          1.0
5                    0.0
6        18         18.0
7                    0.0
8         3          3.0
9        10         10.0

In [24]: df1.dtypes
Out[24]:
kol_click       object
kol_click_1    float64
dtype: object

0xdb

51.2k194 золотых знака56 серебряных знаков227 бронзовых знаков

ответ дан 3 мар 2020 в 13:36

I am using pandas to open a text document as follows.

input_data = pd.read_csv('input.tsv', header=0, delimiter="t", quoting=3 )
L= input_data["title"] + '. ' + input_data["description"]

I found that some of my text equals to nan. Therefore, I tried the following approach.

import math
for text in L:

    if not math.isnan(text):
        print(text)

However, this returned me the following error TypeError: must be real number, not str

Is there a way to identify string nan values in python?

My tsvlooks as follows

id  title   description major   minor
27743058    Partial or total open meniscectomy? : A prospective, randomized study.  In order to compare partial with total meniscectomy a prospective clinical study of 200 patients was carried out. At arthrotomy 100 patients were allocated to each type of operation. The two groups did not differ in duration of symptoms, age distribution, or sex ratio. The operations were performed as conventional arthrotomies. One hundred and ninety two of the patients were seen at follow up 2 and 12 months after operation. There was no difference in the period off work between the two groups. One year after operation, 6 of the 98 patients treated with partial meniscectomy had undergone further operation. In all posterior tears were found at both procedures. Among the 94 patients undergoing total meniscectomy, 4 required further operation. In each, part of the posterior horn had been left at the primary procedure. One year after operation significantly more patients who had undergone partial meniscectomy had been relieved of symptoms. However, the two groups did not show any difference in the degree of radiological changes present.    ### ###
27743057        Synovial oedema is a frequent complication in arthroscopic procedures performed with normal saline as the irrigating fluid. The authors have studied the effect of saline solution, Ringer lactate, 5% Dextran and 10% Dextran in normal saline on 12 specimens of human synovial membrane. They found that 10% Dextran in normal saline decreases the water content of the synovium without causing damage, and recommend this solution for procedures lasting longer than 30 minutes. ### ###

I am using pandas to open a text document as follows.

input_data = pd.read_csv('input.tsv', header=0, delimiter="t", quoting=3 )
L= input_data["title"] + '. ' + input_data["description"]

I found that some of my text equals to nan. Therefore, I tried the following approach.

import math
for text in L:

    if not math.isnan(text):
        print(text)

However, this returned me the following error TypeError: must be real number, not str

Is there a way to identify string nan values in python?

My tsvlooks as follows

id  title   description major   minor
27743058    Partial or total open meniscectomy? : A prospective, randomized study.  In order to compare partial with total meniscectomy a prospective clinical study of 200 patients was carried out. At arthrotomy 100 patients were allocated to each type of operation. The two groups did not differ in duration of symptoms, age distribution, or sex ratio. The operations were performed as conventional arthrotomies. One hundred and ninety two of the patients were seen at follow up 2 and 12 months after operation. There was no difference in the period off work between the two groups. One year after operation, 6 of the 98 patients treated with partial meniscectomy had undergone further operation. In all posterior tears were found at both procedures. Among the 94 patients undergoing total meniscectomy, 4 required further operation. In each, part of the posterior horn had been left at the primary procedure. One year after operation significantly more patients who had undergone partial meniscectomy had been relieved of symptoms. However, the two groups did not show any difference in the degree of radiological changes present.    ### ###
27743057        Synovial oedema is a frequent complication in arthroscopic procedures performed with normal saline as the irrigating fluid. The authors have studied the effect of saline solution, Ringer lactate, 5% Dextran and 10% Dextran in normal saline on 12 specimens of human synovial membrane. They found that 10% Dextran in normal saline decreases the water content of the synovium without causing damage, and recommend this solution for procedures lasting longer than 30 minutes. ### ###

В предыдущих разделах вы видели, как легко могут образовываться недостающие данные. В структурах они определяются как значения NaN (Not a Value). Такой тип довольно распространен в анализе данных.

Но pandas спроектирован так, чтобы лучше с ними работать. Дальше вы узнаете, как взаимодействовать с NaN, чтобы избегать возможных проблем. Например, в библиотеке pandas вычисление описательной статистики неявно исключает все значения NaN.

Если нужно специально присвоить значение NaN элементу структуры данных, для этого используется np.NaN (или np.nan) из библиотеки NumPy.

>>> ser = pd.Series([0,1,2,np.NaN,9],
... 		    index=['red','blue','yellow','white','green'])
>>> ser
red       0.0
blue      1.0
yellow    2.0
white     NaN
green     9.0
dtype: float64

>>> ser['white'] = None 
>>> ser
red       0.0
blue      1.0
yellow    2.0
white     NaN
green     9.0
dtype: float64

Есть несколько способов, как можно избавиться от значений NaN во время анализа данных. Это можно делать вручную, удаляя каждый элемент, но такая операция сложная и опасная, к тому же не гарантирует, что вы действительно избавились от всех таких значений. Здесь на помощь приходит функция dropna().

>>> ser.dropna()
red       0.0
blue      1.0
yellow    2.0
green     9.0
dtype: float64

Функцию фильтрации можно выполнить и прямо с помощью notnull() при выборе элементов.

>>> ser[ser.notnull()]
red       0.0
blue      1.0
yellow    2.0
green     9.0
dtype: float64

В случае с Dataframe это чуть сложнее. Если использовать функцию pandas dropna() на таком типе объекта, который содержит всего одно значение NaN в колонке или строке, то оно будет удалено.

>>> frame3 = pd.DataFrame([[6,np.nan,6],[np.nan,np.nan,np.nan],[2,np.nan,5]],
... 			  index = ['blue','green','red'],
... 			  columns = ['ball','mug','pen'])
>>> frame3

	ball	mug	pen
blue	6.0	NaN	6.0
green	NaN	NaN	NaN
red	2.0	NaN	5.0

>>> frame3.dropna()
Empty DataFrame
Columns: [ball, mug, pen]
Index: []

Таким образом чтобы избежать удаления целых строк или колонок нужно использовать параметр how, присвоив ему значение all. Это сообщит функции, чтобы она удаляла только строки или колонки, где все элементы равны NaN.

>>> frame3.dropna(how='all')

	ball	mug	pen
blue	6.0	NaN	6.0
red	2.0	NaN	5.0

Заполнение NaN

Вместо того чтобы отфильтровывать значения NaN в структурах данных, рискуя удалить вместе с ними важные элементы, можно заменять их на другие числа. Для этих целей подойдет fillna(). Она принимает один аргумент — значение, которым нужно заменить NaN.

>>> frame3.fillna(0)

	ball	pen
blue	6.0	6.0
green	0.0	0.0
red	2.0	5.0

Или же NaN можно заменить на разные значения в зависимости от колонки, указывая их и соответствующие значения.

>>> frame3.fillna({'ball':1,'mug':0,'pen':99})

	ball	pen
blue	6.0	6.0
green	1.0	99.0
red	2.0	5.0

Обучение с трудоустройством

17 авг. 2022 г.
читать 2 мин

Одна ошибка, с которой вы можете столкнуться при использовании pandas:

ValueError : cannot convert float NaN to integer

Эта ошибка возникает, когда вы пытаетесь преобразовать столбец в кадре данных pandas из числа с плавающей запятой в целое число, но столбец содержит значения NaN.

В следующем примере показано, как исправить эту ошибку на практике.

Как воспроизвести ошибку

Предположим, мы создаем следующие Pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23, 25, 29],
 'assists': [5, 7, 7, 9, 12, 9, 9, 4],
 'rebounds': [11, np.nan , 10, 6, 5, np.nan , 9, 12]})

#view DataFrame
df

 points assists rebounds
0 25 5 11
1 12 7 NaN
2 15 7 10
3 14 9 6
4 19 12 5
5 23 9 NaN
6 25 9 9
7 29 4 12

В настоящее время столбец «отскоки» имеет тип данных «плавающий».

#print data type of 'rebounds' column
df['rebounds']. dtype

dtype('float64')

Предположим, мы пытаемся преобразовать столбец «отскоки» из числа с плавающей запятой в целое число:

#attempt to convert 'rebounds' column from float to integer
df['rebounds'] = df['rebounds'].astype (int)

ValueError : cannot convert float NaN to integer

Мы получаем ValueError , потому что значения NaN в столбце «отскоков» не могут быть преобразованы в целые значения.

Как исправить ошибку

Способ исправить эту ошибку состоит в том, чтобы иметь дело со значениями NaN, прежде чем пытаться преобразовать столбец из числа с плавающей запятой в целое число.

Мы можем использовать следующий код, чтобы сначала определить строки, содержащие значения NaN:

#print rows in DataFrame that contain NaN in 'rebounds' column
print(df[df['rebounds']. isnull ()])

 points assists rebounds
1 12 7 NaN
5 23 9 NaN

Затем мы можем либо удалить строки со значениями NaN, либо заменить значения NaN каким-либо другим значением перед преобразованием столбца из числа с плавающей запятой в целое число:

Метод 1: удаление строк со значениями NaN

#drop all rows with NaN values
df = df.dropna ()

#convert 'rebounds' column from float to integer
df['rebounds'] = df['rebounds'].astype (int) 

#view updated DataFrame
df
 points assists rebounds
0 25 5 11
2 15 7 10
3 14 9 6
4 19 12 5
6 25 9 9
7 29 4 12

#view class of 'rebounds' column
df['rebounds']. dtype

dtype('int64')

Способ 2: заменить значения NaN

#replace all NaN values with zeros
df['rebounds'] = df['rebounds']. fillna ( 0 )

#convert 'rebounds' column from float to integer
df['rebounds'] = df['rebounds'].astype (int) 

#view updated DataFrame
df

 points assists rebounds
0 25 5 11
1 12 7 0
2 15 7 10
3 14 9 6
4 19 12 5
5 23 9 0
6 25 9 9
7 29 4 12

#view class of 'rebounds' column
df['rebounds']. dtype

dtype('int64')

Обратите внимание, что оба метода позволяют избежать ошибки ValueError и успешно преобразовать столбец с плавающей запятой в столбец с целым числом.

Дополнительные ресурсы

В следующих руководствах объясняется, как исправить другие распространенные ошибки в Python:

Как исправить: столбцы перекрываются, но суффикс не указан
Как исправить: объект «numpy.ndarray» не имеет атрибута «добавлять»
Как исправить: при использовании всех скалярных значений необходимо передать индекс

Источник

When using a dataset for analysis, you must check your data to ensure it only contains finite numbers and no NaN values (Not a Number). If you try to pass a dataset that contains NaN or infinity values to a function for analysis, you will raise the error: ValueError: input contains nan, infinity or a value too large for dtype(‘float64’).

To solve this error, you can check your data set for NaN values using numpy.isnan() and infinite values using numpy.isfinite(). You can replace NaN values using nan_to_num() if your data is in a numpy array or SciKit-Learn’s SimpleImputer.

This tutorial will go through the error in detail and how to solve it with the help of code examples.

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)
- What is a ValueError?
- What is a NaN in Python?
- What is inf in Python?
Example #1: Dataset with NaN Values
- Solution #1: using nan_to_num()
- Solution #2: using SimpleImputer
Example #2: Dataset with NaN and inf Values
- Solution #1: Using nan_to_num
- Solution #2: Using fillna()
- Solution #3: using SimpleImputer
Summary

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)

What is a ValueError?

In Python, a value is the information stored within a particular object. You will encounter a ValueError in Python when you use a built-in operation or function that receives an argument with the right type but an inappropriate value.

What is a NaN in Python?

In Python, a NaN stands for Not a Number and represents undefined entries and missing values in a dataset.

What is inf in Python?

Infinity in Python is a number that is greater than every other numeric value and can either be positive or negative. All arithmetic operations performed on an infinite value will produce an infinite number. Infinity is a float value; there is no way to represent infinity as an integer. We can use float() to represent infinity as follows:

pos_inf=float('inf')

neg_inf=-float('inf')

print('Positive infinity: ', pos_inf)

print('Negative infinity: ', neg_inf)

Positive infinity:  inf
Negative infinity:  -inf

We can also use the math, decimal, sympy, and numpy modules to represent infinity in Python.

Let’s look at some examples where we want to clean our data of NaN and infinity values.

Example #1: Dataset with NaN Values

In this example, we will generate a dataset consisting of random numbers and then randomly populate the dataset with NaN values. We will try to cluster the values in the dataset using the AffinityPropagation in the Scikit-Learn library.

Note: The use of the AffinityPropagation to cluster on random data is just an example to demonstrate the source of the error. The function you are trying to use may be completely different to AffinityPropagation, but the data preprocessing described in this tutorial will still apply.

The data generation looks as follows:

# Import numpy and AffinityPropagation

import numpy as np

from sklearn.cluster import AffinityPropagation

# Number of NaN values to put into data

n = 4

data = np.random.randn(20)

# Get random indices in the data

index_nan = np.random.choice(data.size, n, replace=False)

# Replace data with NaN

data.ravel()[index_nan]=np.nan

print(data)

Let’s look at the data:

[-0.0063374  -0.974195    0.94467842  0.38736788  0.84908087         nan
  1.00582645         nan  1.87585201 -0.98264992 -1.64822932  1.24843544
  0.88220504 -1.4204208   0.53238027         nan  0.83446561         nan
 -0.04655628 -1.09054183]

The data consists of twenty random values, four of which are NaN, and the rest are numerical values. Let’s try to fit the data using the AffinityPropagation() class.

af= AffinityPropagation(random_state=5).fit([data])

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

We raise the error because the AffinityPropagation.fit() cannot handle NaN, infinity or extremely large values. Our data contains NaN values, and we need to preprocess the data to replace them with suitable values.

Solution #1: using nan_to_num()

To check if a dataset contains NaN values, we can use the isnan() function from NumPy. If we pair this function with any(), we will check if there are any instances of NaN. We can replace the NaN values using the nan_to_num() method. Let’s look at the code and the clean data:

print(np.any(np.isnan(data)))

data = np.nan_to_num(data)

print(data)

True
[-0.0063374  -0.974195    0.94467842  0.38736788  0.84908087  0.
  1.00582645  0.          1.87585201 -0.98264992 -1.64822932  1.24843544
  0.88220504 -1.4204208   0.53238027  0.          0.83446561  0.
 -0.04655628 -1.09054183]

The np.any() part of the code returns True because our dataset contains at least one NaN value. The clean data has zeros in place of the NaN values. Let’s fit on the clean data:

af= AffinityPropagation(random_state=5).fit([data])

This code will execute without any errors.

Solution #2: using SimpleImputer

Scikit-Learn provides a class for imputation called SimpleImputer. We can use the SimpleImputer to replace NaN values. To replace NaN values in a one-dimensional dataset, we need to set the strategy parameter in the SimpleImputer to constant. First, we will generate the data:

import numpy as np

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

print(data)

The data looks like this:

[ 1.4325319   0.61439789  0.3614522   1.38531346         nan  0.6900916
  0.50743745  0.48544145         nan         nan  0.17253557         nan
 -1.05027802  0.09648188  1.15971533  0.29005307  2.35040023  0.44103513
 -0.03235852 -0.78142219]

We can use the SimpleImputer class to fit and transform the data as follows:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

The clean data looks like this:

[[ 1.4325319   0.61439789  0.3614522   1.38531346  0.          0.6900916
   0.50743745  0.48544145  0.          0.          0.17253557  0.
  -1.05027802  0.09648188  1.15971533  0.29005307  2.35040023  0.44103513
  -0.03235852 -0.78142219]]

And we can pass the clean data to the AffinityPropagation clustering method as follows:

af= AffinityPropagation(random_state=5).fit(data)

We can also use the SimpleImputer class on multi-dimensional data to replace NaN values using the mean along each column. We have to set the imputation strategy to “mean”, and using the mean is only valid for numeric data. Let’s look at an example of a 3×3 nested list that contains NaN values:

from sklearn.impute import SimpleImputer

data = [[7, 2, np.nan], 
        [4, np.nan, 6], 
        [10, 5, 9]]

We can replace the NaN values as follows:

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit(data)

data = imp_mean.transform(data)

print(data)

[[ 7.   2.   7.5]
 [ 4.   3.5  6. ]
 [10.   5.   9. ]]

We replaced the np.nan values with the mean of the real numbers along the columns of the nested list. For example, in the third column, the real numbers are 6 and 9, so the mean is 7.5, which replaces the np.nan value in the third column.

We can use the other imputation strategies media and most_frequent.

Example #2: Dataset with NaN and inf Values

This example will generate a dataset consisting of random numbers and then randomly populate the dataset with NaN and infinity values. We will try to cluster the values in the dataset using the AffinityPropagation in the Scikit-Learn library. The data generation looks as follows:

import numpy as np

from sklearn.cluster import AffinityPropagation

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data)

[-0.76148741         inf  0.10339756         nan         inf -0.75013509
  1.2740893          nan -1.68682986         nan  0.57540185 -2.0435754
  0.99287213         inf  0.5838198          inf -0.62896815 -0.45368201
  0.49864775 -1.08881703]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. Let’s try to fit the data using the AffinityPropagation() class.

af= AffinityPropagation(random_state=5).fit([data])

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

We raise the error because the dataset contains NaN values and infinity values.

Solution #1: Using nan_to_num

To check if a dataset contains NaN values, we can use the isnan() function from NumPy. If we pair this function with any(), we will check if there are any instances of NaN.

To check if a dataset contains infinite values, we can use the isfinite() function from NumPy. If we pair this function with any(), we will check if there are any instances of infinity.

We can replace the NaN and infinity values using the nan_to_num() method. The method will set NaN values to zero and infinity values to a very large number. Let’s look at the code and the clean data:

print(np.any(np.isnan(data)))

print(np.all(np.isfinite(data)))

data = np.nan_to_num(data)

print(data)

True

False

[-7.61487414e-001  1.79769313e+308  1.03397556e-001  0.00000000e+000
  1.79769313e+308 -7.50135085e-001  1.27408930e+000  0.00000000e+000
 -1.68682986e+000  0.00000000e+000  5.75401847e-001 -2.04357540e+000
  9.92872128e-001  1.79769313e+308  5.83819800e-001  1.79769313e+308
 -6.28968155e-001 -4.53682014e-001  4.98647752e-001 -1.08881703e+000]

We replaced the NaN values with zeroes and the infinity values with 1.79769313e+308. We can fit on the clean data as follows:

af= AffinityPropagation(random_state=5).fit([data])

This code will execute without any errors. If we do not want to replace infinity with a very large number but with zero, we can convert the infinity values to NaN using:

data[data==np.inf] = np.nan

And then pass the data to the nan_to_num method, converting all the NaN values to zeroes.

Solution #2: Using fillna()

We can use Pandas to convert our dataset to a DataFrame and replace the NaN and infinity values using the Pandas fillna() method. First, let’s look at the data generation:

import numpy as np

import pandas as pd

from sklearn.cluster import AffinityPropagation

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data

[ 0.41339801         inf         nan  0.7854321   0.23319745         nan
  0.50342482         inf -0.82102161 -0.81934623  0.23176869 -0.61882322
  0.12434801 -0.21218049         inf -1.54067848         nan  1.78086445
         inf  0.4881174 ]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. We can convert the numpy array to a DataFrame as follows:

df = pd.DataFrame(data)

Once we have the DataFrame, we can use the replace method to replace the infinity values with NaN values. Then, we will call the fillna() method to replace all NaN values in the DataFrame.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

df = df.fillna(0)

We can use the to_numpy() method to convert the DataFrame back to a numpy array as follows:

data = df.to_numpy()

print(data)

[[ 0.41339801]
 [ 0.        ]
 [ 0.        ]
 [ 0.7854321 ]
 [ 0.23319745]
 [ 0.        ]
 [ 0.50342482]
 [ 0.        ]
 [-0.82102161]
 [-0.81934623]
 [ 0.23176869]
 [-0.61882322]
 [ 0.12434801]
 [-0.21218049]
 [ 0.        ]
 [-1.54067848]
 [ 0.        ]
 [ 1.78086445]
 [ 0.        ]
 [ 0.4881174 ]]

We can now fit on the clean data using the AffinityPropagation class as follows:

af= AffinityPropagation(random_state=5).fit(data)

print(af.cluster_centers_)

The clustering algorithm gives us the following cluster centres:

[[ 0.        ]
 [ 0.50342482]
 [-0.81934623]
 [-1.54067848]
 [ 1.78086445]]

We can also use Pandas to drop columns with NaN values using the dropna() method. For further reading on using Pandas for data preprocessing, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Solution #3: using SimpleImputer

Let’s look at an example of using the SimpleImputer to replace NaN and infinity values. First, we will look at the data generation:

import numpy as np

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data)

[-0.5318616          nan  0.12842066         inf         inf         nan
  1.24679674  0.09636847  0.67969774  1.2029146          nan  0.60090616
 -0.46642723         nan  1.58596659  0.47893738  1.52861316         inf
 -1.36273437         inf]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. Let’s try to use the SimpleImputer to clean our data:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

ValueError: Input contains infinity or a value too large for dtype('float64').

We raise the error because the SimpleImputer method does not support infinite values. To solve this error, you can replace the np.inf with np.nan values as follows:

data[data==np.inf] = np.nan

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

With all infinity values replaced with NaN values, we can use the SimpleImputer to transform the data. Let’s look at the clean dataset:

[[-0.5318616   0.          0.12842066  0.          0.          0.
   1.24679674  0.09636847  0.67969774  1.2029146   0.          0.60090616
  -0.46642723  0.          1.58596659  0.47893738  1.52861316  0.
  -1.36273437  0.        ]]

Consider the case where we have multi-dimensional data with NaN and infinity values, and we want to use the SimpleImputer method. In that case, we can replace the infinite by using the Pandas replace() method as follows:

from sklearn.impute import SimpleImputer

data = [[7, 2, np.nan], 
        [4, np.nan, 6], 
        [10, 5, np.inf]]

df = pd.DataFrame(data)

df.replace([np.inf, -np.inf], np.nan, inplace=True)

data = df.to_numpy()

Then we can use the SimpleImputer to fit and transform the data. In this case, we will replace the missing values with the mean along the column where each NaN value occurs.

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit(data)

data = imp_mean.transform(data)

print(data)

The clean data looks like this:

[[ 7.   2.   6. ]
 [ 4.   3.5  6. ]
 [10.   5.   6. ]]

Summary

Congratulations on reading to the end of this tutorial! If you pass a NaN or an infinite value to a function, you may raise the error: ValueError: input contains nan, infinity or a value too large for dtype(‘float64’). This commonly occurs as a result of not preprocessing data before analysis. To solve this error, check your data for NaN and inf values and either remove them or replace them with real numbers.

You can only replace NaN values with the SimpleImputer method. If you try to replace infinity values with the SimpleImputer, you will raise the ValueError. Ensure that you convert all positive and negative infinity values to NaN before using the SimpleImputer.

For further reading on ValueErrors, go to the article: How to Solve Python ValueError: I/O operation on closed file.

or further reading on Scikit-learn, go to the article: How to Solve Sklearn ValueError: Unknown label type: ‘continuous’.

Go to the online courses page on Python to learn more about coding in Python for data science and machine learning.

Have fun and happy researching!

Источник

You should try this instead:

 print "{} green bottles, hanging on the wall".format(numberToText(i))

My Reasoning for Recommending format

Originally, in Python the common way to perform string interpolation (which is what we’re talking about here) was to use format strings (such as %s, %d, or %g) and a format operator between your string and the things that were to be inserted into it.

However, you have to use the correct format string for the kind of thing you want to interpolate.

For instance, to insert a string into another string, you use %s:

>>> print "%s green bottles, hanging on the wall" % "fifteen"
fifteen green bottles, hanging on the wall

However, if you have a number you wish to interpolate, you would typically use %d:

>>> print "%d green bottles, hanging on the wall" % 15
15 green bottles, hanging on the wall

If you make a mistake, you get an error:

>>> print "%d green bottles, hanging on the wall" % "15"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: %d format: a number is required, not str

You seemed like you are new to Python, so I recommend just using format instead, which is a newer and more powerful way to perform string interpolation:

 >>> print "{} green bottles, hanging on the {}".format(20, "balcony")
 20 green bottles, hanging on the balcony

When you get more comfortable with string interpolation, you can try to do more advanced stuff with format. You will probably never need to use % again to perform string interpolation. I recommend just trying to focus on and remember format.

Источник

Hi guys, today we will learn about NaN. In addition, we will learn about checking whether a given string is a NaN in Python. You will be wondering what’s this NaN. So let me tell you that Nan stands for Not a Number. It is a member of the numeric data type that represents an unpredictable value. For example, Square root of a negative number is a NaN, Subtraction of an infinite number from another infinite number is also a NaN. so basically, NaN represents an undefined value in a computing system.

How to Check if a string is NaN in Python

We can check if a string is NaN by using the property of NaN object that a NaN != NaN.

Let us define a boolean function isNaN() which returns true if the given argument is a NaN and returns false otherwise.

def isNaN(string):
    return string != string
print(isNaN("hello"))
print(isNaN(np.nan))

The output of the following code will be

False
True

We can also take a value and convert it to float to check whether it is NaN. For these, we import the math module and use the math.isnan() method. See the below code.

def isnan(value):
    try:
        import math
        return math.isnan(float(value))
    except:
        return False
print(isnan('hello'))
print(isnan('NaN'))
print(isnan(100)) 
print(isnan(str()))

Run this code online

Output:

False
True
False
False

A NaN can also be used to represent a missing value in computation. See the below code:

import numpy as np
l=['abc', 'xyz', 'pqr', np.nan]
print(l)
l_new=['missing' if x is np.nan else x for x in l]
print(l_new)

Output:

['abc', 'xyz', 'pqr', nan]
['abc', 'xyz', 'pqr', 'missing']

Заполнение NaN

Обучение с трудоустройством

Как воспроизвести ошибку

Как исправить ошибку

Дополнительные ресурсы

Table of contents

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)

What is a ValueError?

What is a NaN in Python?

What is inf in Python?

Example #1: Dataset with NaN Values

Solution #1: using nan_to_num()

Solution #2: using SimpleImputer

Example #2: Dataset with NaN and inf Values

Solution #1: Using nan_to_num

Solution #2: Using fillna()

Solution #3: using SimpleImputer

Summary

How to Check if a string is NaN in Python