I am training a CNN model for trading using indicator and MA lines to compose a 2D array as input. This affects the results of an algorithm of any kind (image processing, machine learning, deep learning algorithm…). Key takeaways for you. During data analysis when you detect the outlier one of most difficult decision could be how one should deal with the outlier. Outliers in input data can skew and mislead the training process of machine learning algorithms resulting in longer training times, less accurate models and … To deal with outlier itself is a very challenging task in Machine Learning. Univariate outliers exist when one of the feature value is deviating from other data points on the same feature value. For Example, you can clearly see the outlier in this list: [20,24,22,19,29,18, 4300 ,30,18] It is easy to identify it when the observations are just a bunch of numbers and it is one dimensional but when you have thousands of observations or multi-dimensions, you will need more clever ways to detect those values. As the IQR and standard deviation changes after the removal of outliers, this may lead to wrongly detecting some new values as outliers. As you know when you perform uni-variate analysis you pay attention on every individual feature at time. Wikipedia defines outliers as "an observation point that is distant from other observations." That means, some minority cases in the data set are different from the majority of the data. Machine Learning is a part of Artificial Intelligence. Perhaps the most commonly adopted definition is based on the distance between each data point and the mean. An API for outlier detection was released as experimental in 7.3, and with 7.4, we've released a dedicated UI in machine learning for performing outlier detection. We have first created an empty dataframe named farm then added features and values to it. So this is the recipe on how we can deal with outliers in Python Step 1 - Import the library import numpy as np import pandas as pd We have imported numpy and pandas. Handling Outliers. The great advantage of Tukey's box plot method is that the statistics (e.g. Z-Score. Dealing with outliers when Inter Quartile Range is 0. After deleting the outliers, we should be careful not to run the outlier detection test once again. Dataframe named farm then added features and values to it. So, when modeling, it is extremely important to clean the data sample to ensure that the observations best represent the problem. There are two types of outliers – univariate and multivariate. 