{"id":1022,"date":"2020-10-19T07:00:00","date_gmt":"2020-10-19T04:00:00","guid":{"rendered":"https:\/\/www.dataplatform.gr\/?p=1022"},"modified":"2025-06-12T18:11:20","modified_gmt":"2025-06-12T15:11:20","slug":"ti-einai-ta-montela-sto-data-science","status":"publish","type":"post","link":"https:\/\/www.dataplatform.gr\/en\/ti-einai-ta-montela-sto-data-science\/","title":{"rendered":"What are models in Data Science"},"content":{"rendered":"<p>What are models in Data Science?<\/p>\n\n\n\n<p>In a simple sentence, models are built so that we can make predictions about a trend we are investigating.<\/p>\n\n\n\n<p>There are two categories of models&nbsp;<strong>supervised<\/strong>&nbsp;which we train and unsupervised which is done through neural networks.<\/p>\n\n\n\n<p>In the article we will deal with the first category which has 3 subcategories&nbsp;<strong>regression<\/strong><strong>, <\/strong><strong>classification<\/strong><strong> and <\/strong><strong>decision<\/strong><strong> <\/strong><strong>tree<\/strong>.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"\u03bf\u03b9-\u03c0\u03b9\u03bf-\u03b3\u03bd\u03c9\u03c3\u03c4\u03ad\u03c2-\u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b5\u03c2-\u03cc\u03c4\u03b1\u03bd-\u03c7\u03c1\u03b7\u03c3\u03b9\u03bc\u03bf\u03c0\u03bf\u03b9\u03bf\u03cd\u03bc\u03b5-regression\">The most familiar categories when we use regression<\/h5>\n\n\n\n<p> THE <strong>Linear <\/strong>which we try with a straight line to pass through all the price points.<\/p>\n\n\n\n<p>THE <strong>Polynomial <\/strong>which depending on its degree we can use multiple parameters so that it approaches more points.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"\u03c4\u03bf-\u03c0\u03c1\u03cc\u03b2\u03bb\u03b7\u03bc\u03b1-\u03c0\u03bf\u03c5-\u03b4\u03b7\u03bc\u03b9\u03bf\u03c5\u03c1\u03b3\u03b5\u03af\u03c4\u03b1\u03b9\">The problem that arises <\/h5>\n\n\n\n<p>Many times it is the right balance as the better the model fits the points there is a greater chance that future points will have a greater deviation and so we have <strong>overfitting<\/strong>.<\/p>\n\n\n\n<p><strong>Underfitting <\/strong>we have when the model does not go through most of the points then maybe we should change the model type or increase the degrees\/folds.<\/p>\n\n\n\n<p>Each model needs to be trained at first with a percentage of data and the remaining percentage is used for testing.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"\u03c4\u03bf-\u03c0\u03cc\u03c3\u03bf-\u03b1\u03ba\u03c1\u03b9\u03b2\u03ad\u03c2-\u03b5\u03af\u03bd\u03b1\u03b9-\u03c4\u03bf-\u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03bf\">How accurate the model is<\/h5>\n\n\n\n<p>It is distinguished by its prices <strong>R^2<\/strong> (how little deviation the values have from the model line) with values from 0~1. <\/p>\n\n\n\n<p>As also from <strong>RMSE<\/strong> (the square of the mean of the difference between the predicted values and the actual values) with values above zero and when we say zero it means that we have the perfect model which sets it as impossible.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"\u03b1\u03bd\u03b1\u03bb\u03c5\u03c4\u03b9\u03ba\u03cc-\u03c0\u03b1\u03c1\u03ac\u03b4\u03b5\u03b9\u03b3\u03bc\u03b1\">Detailed example<\/h4>\n\n\n\n<p>First we&#039;ll load all the libraries I might need so we don&#039;t get confused later.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">import itertools\n\nimport numpy as np\n\nimport matplotlib.pyplot as plt\n\nfrom matplotlib.ticker import NullFormatter\n\nimport pandas as pd\n\nimport numpy as np\n\nimport matplotlib.ticker as ticker\n\nfrom sklearn import preprocessing\n\nfrom sklearn.ensemble import RandomForestRegressor\n\nfrom sklearn.linear_model import RidgeClassifier\n\nfrom sklearn.model_selection import cross_val_score\n\nfrom sklearn.model_selection import train_test_split\n\nfrom sklearn.tree import DecisionTreeClassifier\n\nfrom sklearn.metrics import f1_score\n\nfrom sklearn.metrics import jaccard_similarity_score\n\nfrom sklearn import svm\n\nfrom sklearn import preprocessing\n\nfrom sklearn.impute import SimpleImputer\n\nfrom sklearn.linear_model import Ridge\n\nfrom sklearn.linear_model import LinearRegression\n\nfrom sklearn.preprocessing import StandardScaler,PolynomialFeatures\n\nimport seaborn as sns\n\n%matplotlib inline\n\n#!conda install -c anaconda seaborn -y<\/pre>\n\n\n\n<p>We load a csv with car details into a dataframe, keep as many shipments as we want and make a modification to get the average consumption.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">df = pd.read_csv('https:\/\/gist.githubusercontent.com\/smatzouranis\/acd3354f30ecc1e7cb90caee84650c3a\/raw\/61adad1fca973303f4af8bc378b3a5432b7371e7\/autos_csv.csv')\n\ndf = df[['make','fuel-type','horsepower','city-mpg','highway-mpg','price']]\n\ndf['AVG-mpg'] = (df['city-mpg']+df['highway-mpg'])\/2\n\ndff = df[['make','fuel-type','horsepower','AVG-mpg','price']]\n\ndff.head()<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"408\" height=\"190\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/1-mod.png\" alt=\"\" class=\"wp-image-1026\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/1-mod.png 408w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/1-mod-300x140.png 300w\" sizes=\"auto, (max-width: 408px) 100vw, 408px\" \/><\/figure>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"bar-plot\">Bar plot<\/h5>\n\n\n\n<p>Let&#039;s make a quick bar plot of the cost per brand.<\/p>\n\n\n\n<p>It only needs 5 command lines.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">makers = dff['make']\n\nprices = dff['price']\n\nfig, ax = plt.subplots(figsize=(8, 8))\n\nplt.style.use('fivethirtyeight')\n\nax.barh(makers, prices)<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"494\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/2-mod.png\" alt=\"\" class=\"wp-image-1027\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/2-mod.png 624w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/2-mod-300x238.png 300w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/figure>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"\u03c0\u03c1\u03bf\u03b5\u03c4\u03bf\u03b9\u03bc\u03b1\u03c3\u03af\u03b1-\u03c4\u03c9\u03bd-\u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03c9\u03bd\">Data preparation<\/h5>\n\n\n\n<p>Because we had text on what fuel each name has we will make new columns one for oil and one for gasoline with 0 or 1 depending on what it has with the command .get_dummies and to pass the change you need the parameter <strong>inplace=True<\/strong><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">dff = pd.concat([dff,pd.get_dummies(dff['fuel-type'])], axis=1)\n\ndff.drop(['fuel-type'], axis = 1,inplace=True)\n\ndff.head()<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"407\" height=\"198\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/3-mod.png\" alt=\"\" class=\"wp-image-1028\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/3-mod.png 407w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/3-mod-300x146.png 300w\" sizes=\"auto, (max-width: 407px) 100vw, 407px\" \/><\/figure>\n\n\n\n<p>Now we will fill in the cars that do not have a price an average price from the rest.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">X = dff[dff.columns.difference(['price'])]\n\nX =X.fillna(X.mean())\n\nX.head()<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"341\" height=\"185\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/4-mod.png\" alt=\"\" class=\"wp-image-1029\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/4-mod.png 341w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/4-mod-300x163.png 300w\" sizes=\"auto, (max-width: 341px) 100vw, 341px\" \/><\/figure>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">y = dff[['price']]\n\ny =y.fillna(y.mean())\n\ny[0:5]<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"120\" height=\"194\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/5-mod.png\" alt=\"\" class=\"wp-image-1030\"\/><\/figure>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"linear-regression\">Linear regression<\/h5>\n\n\n\n<p>Through the seaborn library we make a quick linear regression plot to see how the cost increases in terms of horsepower with just one line of code.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">ax = sns.regplot(x='horsepower', y='price', data=dff)<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"456\" height=\"284\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/6-mod.png\" alt=\"\" class=\"wp-image-1031\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/6-mod.png 456w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/6-mod-300x187.png 300w\" sizes=\"auto, (max-width: 456px) 100vw, 456px\" \/><\/figure>\n\n\n\n<p>We make 2 functions so that we can quickly and easily make the graphs.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">def DistributionPlot(RedFunction,BlueFunction,RedName,BlueName,Title ):\n\n    width = 12\n\n    height = 10\n\n    plt.figure(figsize=(width, height))\n\n    ax1 = sns.distplot(RedFunction, hist=False, color=\u201dr\u201d, label=RedName)\n\n    ax2 = sns.distplot(BlueFunction, hist=False, color=\u201db\u201d, label=BlueName, ax=ax1)\n\n    plt.title(Title)\n\n    plt.xlabel('\u03a4\u03b9\u03bc\u03ae \u03c3\u03b5 \u03b4\u03bf\u03bb\u03bb\u03ac\u03c1\u03b9\u03b1')\n\n    plt.ylabel('\u03a7\u03b1\u03c1\u03b1\u03ba\u03c4\u03b7\u03c1\u03b9\u03c3\u03c4\u03b9\u03ba\u03ac')\n\n    plt.show()\n\n    plt.close()\n\ndef PollyPlot(xtrain,xtest,y_train,y_test,lr,poly_transform):\n\n    width = 12\n\n    height = 10\n\n    plt.figure(figsize=(width, height))\n\n    xmax=max([xtrain.values.max(),xtest.values.max()])\n\n    xmin=min([xtrain.values.min(),xtest.values.min()])\n\n    x=np.arange(xmin,xmax,0.1)\n\n    plt.plot(xtrain,y_train,'ro',label='Training \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03b1')\n\n    plt.plot(xtest,y_test,'go',label='Test \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03b1')\n\n    plt.plot(x,lr.predict(poly_transform.fit_transform(x.reshape(-1,1))),label='\u03a0\u03c1\u03bf\u03b2\u03bb\u03b5\u03c0\u03cc\u03bc\u03b5\u03bd\u03b1')\n\n    plt.ylim([-10000,60000])\n\n    plt.ylabel('Price')\n\n    plt.legend()<\/pre>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"\u03c7\u03c9\u03c1\u03b9\u03c3\u03bc\u03cc\u03c2-\u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03c9\u03bd-\u03c3\u03b5-train-\u03ba\u03b1\u03b9-test\">Split data into Train and Test<\/h5>\n\n\n\n<p>We divide the data into training and test with a percentage of 70-30.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)\n\nprint(\"number of test samples :\", X_test.shape[0])\n\nprint(\"number of training samples:\",X_train.shape[0])\n<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>number of test samples : 62\n\nnumber of training samples: 143<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"\u03b2\u03b1\u03b8\u03bc\u03bf\u03bb\u03cc\u03b3\u03b7\u03c3\u03b7-\u03c4\u03bf\u03c5-\u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03bf\u03c5\">Grading the model<\/h5>\n\n\n\n<p>We define in a variable the class we will use and start the training with the data. Then with the score we see that the R^2 is 0.33 which shows that the line does not pass through most of the points.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">lre = LinearRegression()\n\nlre.fit(X_train[['horsepower']],y_train)\n\nlre.score(X_test[['horsepower']],y_test)<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>0.3331272902078515<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"cross-validate\">Cross-validate<\/h5>\n\n\n\n<p>We can cross validate to score in the following way by dividing the data into 4 pieces and testing each one separately.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">Rcross=cross_val_score(lre,X[['horsepower']], y,cv=4)\n\nprint(\"The mean of the folds are\", Rcross.mean(),\"and the standard deviation is\" ,Rcross.std())<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>The mean of the folds are 0.4392710840512933 and the standard deviation is 0.16681254993011282<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"\u03b7-\u03c0\u03c1\u03cc\u03b2\u03bb\u03b5\u03c8\u03b7-\u03c4\u03b7\u03c2-\u03c4\u03b9\u03bc\u03ae\u03c2\">Price prediction<\/h5>\n\n\n\n<p>Let&#039;s try to build a model that predicts the price from the characteristics of the car.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">lre = LinearRegression()\n\nlre.fit(X_train[['horsepower', 'AVG-mpg', 'diesel', 'gas']],y_train)\n\nlre.score(X_test[['horsepower', 'AVG-mpg', 'diesel', 'gas']],y_test)<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>0.40859095219326313<\/code><\/pre>\n\n\n\n<p>We define the predicted values as yhat.&nbsp;<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">yhat_train=lre.predict(X_train[['horsepower', 'AVG-mpg', 'diesel', 'gas']])\n\nyhat_train[3:7]<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>array(&#91;&#91;21400.06711013],\n       &#91; 9861.84714747],\n       &#91; 4442.33687254],\n       &#91; 5784.42153314]])<\/code><\/pre>\n\n\n\n<p>We see that we did not have good accuracy in our model<\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"\u03b7-\u03b1\u03ba\u03c1\u03af\u03b2\u03b5\u03b9\u03b1-\u03c4\u03bf\u03c5-\u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03bf\u03c5\">The accuracy of the model<\/h5>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">Title='Train Data \u2013 \u0393\u03c1\u03ac\u03c6\u03b7\u03bc\u03b1 \u03c0\u03c1\u03bf\u03b2\u03bb\u03b5\u03c0\u03cc\u03bc\u03b5\u03bd\u03c9\u03bd data vs actual data'\n\nDistributionPlot(y_train,yhat_train,\"\u03a0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03ac\",\"\u03a0\u03c1\u03bf\u03b2\u03bb\u03b5\u03c0\u03cc\u03bc\u03b5\u03bd\u03b1\",Title)<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"840\" height=\"654\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/7-mod.png\" alt=\"\" class=\"wp-image-1032\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/7-mod.png 840w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/7-mod-300x234.png 300w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/7-mod-768x598.png 768w\" sizes=\"auto, (max-width: 840px) 100vw, 840px\" \/><\/figure>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">Title='Test Data \u2013 \u0393\u03c1\u03ac\u03c6\u03b7\u03bc\u03b1 \u03c0\u03c1\u03bf\u03b2\u03bb\u03b5\u03c0\u03cc\u03bc\u03b5\u03bd\u03c9\u03bd data vs actual data'<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">DistributionPlot(y_test,yhat_test,\"\u03a0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03ac\",\"\u03a0\u03c1\u03bf\u03b2\u03bb\u03b5\u03c0\u03cc\u03bc\u03b5\u03bd\u03b1\",Title)<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"839\" height=\"660\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/8-mod.png\" alt=\"\" class=\"wp-image-1033\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/8-mod.png 839w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/8-mod-300x236.png 300w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/8-mod-768x604.png 768w\" sizes=\"auto, (max-width: 839px) 100vw, 839px\" \/><\/figure>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"polynomial-\u03b1\u03bb\u03bb\u03ac-\u03c4\u03b9-\u03b2\u03b1\u03b8\u03bc\u03bf\u03cd\">Polynomial but what degree?<\/h5>\n\n\n\n<p>Let&#039;s go to polynomial of degree 5 and see if we can get something better. This time only with the horses.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">pr=PolynomialFeatures(degree=5)\n\nX_train_pr=pr.fit_transform(X_train[['horsepower']])\n\nX_test_pr=pr.fit_transform(X_test[['horsepower']])\n\npoly = LinearRegression()\n\npoly.fit(X_train_pr, y_train)\n\nLinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)\n\nyhat=poly.predict(X_test_pr)\n\nprint(\"\u03a0\u03c1\u03bf\u03b2\u03bb\u03b5\u03c0\u03cc\u03bc\u03b5\u03bd\u03b5\u03c2 \u03c4\u03b9\u03bc\u03ad\u03c2:\", yhat_test[0:4])\n\nprint(\"\u03a0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03ad\u03c2 \u03c4\u03b9\u03bc\u03ad\u03c2:\",y_test[0:4].values)<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>\u03a0\u03c1\u03bf\u03b2\u03bb\u03b5\u03c0\u03cc\u03bc\u03b5\u03bd\u03b5\u03c2 \u03c4\u03b9\u03bc\u03ad\u03c2: &#91;&#91; 5524.18191454]\n &#91;21532.75818567]\n &#91;14610.3150921 ]\n &#91; -995.04541995]]\n\u03a0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03ad\u03c2 \u03c4\u03b9\u03bc\u03ad\u03c2: &#91;&#91; 6795.]\n &#91;15750.]\n &#91;15250.]\n &#91; 5151.]]<\/code><\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">PollyPlot(X_train[['horsepower']],X_test[['horsepower']],y_train,y_test,poly,pr)<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"854\" height=\"619\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/9-mod.png\" alt=\"\" class=\"wp-image-1023\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/9-mod.png 854w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/9-mod-300x217.png 300w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/9-mod-768x557.png 768w\" sizes=\"auto, (max-width: 854px) 100vw, 854px\" \/><\/figure>\n\n\n\n<p>We see an improvement.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">poly.score(X_train_pr,y_train)<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>0.6830658437904327<\/code><\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">poly.score(X_test_pr,y_test)<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>0.6830658437904327<\/code><\/pre>\n\n\n\n<p>We can make a loop that tests the process with different degrees to choose the best one.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">Rsqu_test=[]\n\norder=[1,2,3,4]\n\nfor n in order:\n\n    pr=PolynomialFeatures(degree=n)\n\n    X_train_pr=pr.fit_transform(X_train[['horsepower']])\n\n    X_test_pr=pr.fit_transform(X_test[['horsepower']])    \n\n    lre.fit(X_train_pr,y_train)\n\n    Rsqu_test.append(lre.score(X_test_pr,y_test))\n\nplt.plot(order,Rsqu_test)\n\nplt.xlabel('order')\n\nplt.ylabel('R^2')\n\nplt.title('R^2 \u03bc\u03b5 \u03c7\u03c1\u03ae\u03c3\u03b7 Test Data')\n\nplt.text(3, 0.75, 'Maximum R^2 ') <\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>Text(3, 0.75, 'Maximum R^2 ')<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"439\" height=\"476\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/10-mod.png\" alt=\"\" class=\"wp-image-1024\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/10-mod.png 439w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/10-mod-277x300.png 277w\" sizes=\"auto, (max-width: 439px) 100vw, 439px\" \/><\/figure>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"ridge-model\">Ridge model<\/h5>\n\n\n\n<p>We also do a final test with the rigde model to see if we will have even better accuracy.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">pr=PolynomialFeatures(degree=2)\n\nX_train_pr=pr.fit_transform(X_train[['horsepower', 'AVG-mpg', 'diesel', 'gas']])\n\nX_test_pr=pr.fit_transform(X_test[['horsepower', 'AVG-mpg', 'diesel', 'gas']])\n\nRigeModel=Ridge(alpha=0.01)\n\nRigeModel.fit(X_train_pr, y_train)\n\nyhat=RigeModel.predict(X_test_pr)\n\nprint('predicted:', yhat[0:4])\n\nprint('test set :', y_test[0:4].values)<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>predicted: &#91;&#91; 6807.95323245]\n &#91;20468.43090602]\n &#91;14849.82259996]\n &#91;10178.31016628]]\ntest set : &#91;&#91; 6795.]\n &#91;15750.]\n &#91;15250.]\n &#91; 5151.]]<\/code><\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\" data-no-translation=\"\" data-no-auto-translation=\"\">Rsqu_test=[]\n\nRsqu_train=[]\n\ndummy1=[]\n\nALFA=5000*np.array(range(0,2))\n\nfor alfa in ALFA:\n\n    RigeModel=Ridge(alpha=alfa) \n\n    RigeModel.fit(X_train_pr,y_train)\n\n    Rsqu_test.append(RigeModel.score(X_test_pr,y_test))\n\n    Rsqu_train.append(RigeModel.score(X_train_pr,y_train))\n\nwidth = 12\n\nheight = 10\n\nplt.figure(figsize=(width, height))\n\nplt.plot(ALFA,Rsqu_test,label='validation data  ')\n\nplt.plot(ALFA,Rsqu_train,'r',label='training Data ')\n\nplt.xlabel('alpha')\n\nplt.ylabel('R^2')\n\nplt.legend()<\/pre>\n\n\n\n<pre class=\"wp-block-code\" data-no-translation=\"\" data-no-auto-translation=\"\"><code>&lt;matplotlib.legend.Legend at 0x7f32d8769358&gt;<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"801\" height=\"633\" src=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/11-mod.png\" alt=\"\" class=\"wp-image-1025\" srcset=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/11-mod.png 801w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/11-mod-300x237.png 300w, https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/11-mod-768x607.png 768w\" sizes=\"auto, (max-width: 801px) 100vw, 801px\" \/><\/figure>\n\n\n\n<p>After a TLDR post I think you will get an idea. <\/p>\n\n\n\n<p>They will certainly seem difficult and complex to you, but with use and experience, because the code required is only a few lines, their daily use will be easy.<\/p>","protected":false},"excerpt":{"rendered":"<p>What are models in Data Science? In a simple sentence, models are built so that we can make predictions about a trend we are investigating. There are two categories of models: supervised, which we train, and unsupervised, which is done through neural networks. In the article we will deal with the first category which has 3 subcategories regression, classification and [...]<\/p>","protected":false},"author":1,"featured_media":692,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,14],"tags":[24,9],"class_list":["post-1022","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-datascience_ai","category-python","tag-data-analysis","tag-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science - DataPlatform.gr<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dataplatform.gr\/en\/ti-einai-ta-montela-sto-data-science\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science - DataPlatform.gr\" \/>\n<meta property=\"og:description\" content=\"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 models \u03c3\u03c4\u03bf Data Science; \u039c\u03b5 \u03bc\u03af\u03b1 \u03b1\u03c0\u03bb\u03ae \u03c0\u03c1\u03cc\u03c4\u03b1\u03c3\u03b7 \u03b5\u03af\u03bd\u03b1\u03b9 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c6\u03c4\u03b9\u03ac\u03c7\u03bd\u03bf\u03bd\u03c4\u03b1\u03b9 \u03ce\u03c3\u03c4\u03b5 \u03bd\u03b1 \u03bc\u03c0\u03bf\u03c1\u03bf\u03cd\u03bc\u03b5 \u03bd\u03b1 \u03ba\u03ac\u03bd\u03bf\u03c5\u03bc\u03b5 \u03c0\u03c1\u03bf\u03b2\u03bb\u03ad\u03c8\u03b5\u03b9\u03c2 \u03b3\u03b9\u03b1 \u03bc\u03b9\u03b1 \u03c4\u03ac\u03c3\u03b7 \u03c0\u03bf\u03c5 \u03b5\u03c1\u03b5\u03c5\u03bd\u03bf\u03cd\u03bc\u03b5. \u03a5\u03c0\u03ac\u03c1\u03c7\u03bf\u03c5\u03bd \u03b4\u03cd\u03bf \u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b5\u03c2 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03c9\u03bd \u03c4\u03b1&nbsp;supervised&nbsp;\u03c0\u03bf\u03c5 \u03c4\u03b1 \u03b5\u03ba\u03c0\u03b1\u03b9\u03b4\u03b5\u03cd\u03bf\u03c5\u03bc\u03b5 \u03b5\u03bc\u03b5\u03af\u03c2 \u03ba\u03b1\u03b9 \u03c3\u03c4\u03b1 unsupervised \u03c0\u03bf\u03c5 \u03b3\u03af\u03bd\u03b5\u03c4\u03b1\u03b9 \u03bc\u03ad\u03c3\u03c9 neural networks. \u03a3\u03c4\u03bf \u03ac\u03c1\u03b8\u03c1\u03bf \u03b8\u03b1 \u03b1\u03c3\u03c7\u03bf\u03bb\u03b7\u03b8\u03bf\u03cd\u03bc\u03b5 \u03bc\u03b5 \u03c4\u03b7\u03bd \u03c0\u03c1\u03ce\u03c4\u03b7 \u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b1 \u03c0\u03bf\u03c5 \u03ad\u03c7\u03b5\u03b9 3 \u03c5\u03c0\u03bf\u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b5\u03c2&nbsp;regression, classification \u03ba\u03b1\u03b9 [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dataplatform.gr\/en\/ti-einai-ta-montela-sto-data-science\/\" \/>\n<meta property=\"og:site_name\" content=\"DataPlatform.gr\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/dataplatform.gr\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-10-19T04:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-12T15:11:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/dp_datascience.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Stratos Matzouranis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Stratos Matzouranis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/\"},\"author\":{\"name\":\"Stratos Matzouranis\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#\\\/schema\\\/person\\\/e87bf4fd02b65cb6aa0942f87245bbaf\"},\"headline\":\"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science\",\"datePublished\":\"2020-10-19T04:00:00+00:00\",\"dateModified\":\"2025-06-12T15:11:20+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/\"},\"wordCount\":186,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dataplatform.gr\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/dp_datascience.png\",\"keywords\":[\"Data Analysis\",\"Python\"],\"articleSection\":[\"Data Science &amp; Ai\",\"Python\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/\",\"url\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/\",\"name\":\"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science - DataPlatform.gr\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dataplatform.gr\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/dp_datascience.png\",\"datePublished\":\"2020-10-19T04:00:00+00:00\",\"dateModified\":\"2025-06-12T15:11:20+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.dataplatform.gr\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/dp_datascience.png\",\"contentUrl\":\"https:\\\/\\\/www.dataplatform.gr\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/dp_datascience.png\",\"width\":1280,\"height\":720},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/ti-einai-ta-montela-sto-data-science\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u0391\u03c1\u03c7\u03b9\u03ba\u03ae\",\"item\":\"https:\\\/\\\/www.dataplatform.gr\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Science &amp; Ai\",\"item\":\"https:\\\/\\\/www.dataplatform.gr\\\/category\\\/datascience_ai\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#website\",\"url\":\"https:\\\/\\\/www.dataplatform.gr\\\/\",\"name\":\"dataplatform.gr - Sky is not the limit!\",\"description\":\"\u0398\u03b5\u03c9\u03c1\u03af\u03b1, \u03bf\u03b4\u03b7\u03b3\u03bf\u03af \u03ba\u03b1\u03b9 \u03c3\u03ba\u03ad\u03c8\u03b5\u03b9\u03c2 \u03b3\u03b9\u03b1 \u03bd\u03b1 \u03ba\u03ac\u03bd\u03b5\u03c4\u03b5 \u03c4\u03b7 \u03b4\u03bf\u03c5\u03bb\u03b5\u03b9\u03ac \u03c3\u03b1\u03c2 \u03c0\u03b9\u03bf \u03c0\u03b1\u03c1\u03b1\u03b3\u03c9\u03b3\u03b9\u03ba\u03ac \u03ba\u03b1\u03b9 \u03c0\u03b9\u03bf \u03b5\u03cd\u03ba\u03bf\u03bb\u03b1 \u03c0\u03ac\u03bd\u03c9 \u03c3\u03c4\u03b9\u03c2 \u03b2\u03ac\u03c3\u03b5\u03b9\u03c2 \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03c9\u03bd, \u03c3\u03c4\u03b7\u03bd SQL, \u03c3\u03c4\u03bf Business Intelligence \u03ba\u03b1\u03b9 \u03c3\u03c4\u03b1 \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03b1 \u03b3\u03b5\u03bd\u03b9\u03ba\u03cc\u03c4\u03b5\u03c1\u03b1.\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.dataplatform.gr\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#organization\",\"name\":\"dataplatform.gr\",\"url\":\"https:\\\/\\\/www.dataplatform.gr\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.dataplatform.gr\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/dp_logo_wbacki.png\",\"contentUrl\":\"https:\\\/\\\/www.dataplatform.gr\\\/wp-content\\\/uploads\\\/2020\\\/06\\\/dp_logo_wbacki.png\",\"width\":322,\"height\":139,\"caption\":\"dataplatform.gr\"},\"image\":{\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/dataplatform.gr\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/dataplatform-gr\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.dataplatform.gr\\\/#\\\/schema\\\/person\\\/e87bf4fd02b65cb6aa0942f87245bbaf\",\"name\":\"Stratos Matzouranis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ab973bc4bd1673c43d45de5633a624d9ad13c06902dfdd5a6e3fd9885903865e?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ab973bc4bd1673c43d45de5633a624d9ad13c06902dfdd5a6e3fd9885903865e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ab973bc4bd1673c43d45de5633a624d9ad13c06902dfdd5a6e3fd9885903865e?s=96&d=mm&r=g\",\"caption\":\"Stratos Matzouranis\"},\"sameAs\":[\"https:\\\/\\\/www.dataplatform.gr\"],\"url\":\"https:\\\/\\\/www.dataplatform.gr\\\/en\\\/author\\\/stratos-matzouranis\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science - DataPlatform.gr","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dataplatform.gr\/en\/ti-einai-ta-montela-sto-data-science\/","og_locale":"en_US","og_type":"article","og_title":"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science - DataPlatform.gr","og_description":"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 models \u03c3\u03c4\u03bf Data Science; \u039c\u03b5 \u03bc\u03af\u03b1 \u03b1\u03c0\u03bb\u03ae \u03c0\u03c1\u03cc\u03c4\u03b1\u03c3\u03b7 \u03b5\u03af\u03bd\u03b1\u03b9 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c6\u03c4\u03b9\u03ac\u03c7\u03bd\u03bf\u03bd\u03c4\u03b1\u03b9 \u03ce\u03c3\u03c4\u03b5 \u03bd\u03b1 \u03bc\u03c0\u03bf\u03c1\u03bf\u03cd\u03bc\u03b5 \u03bd\u03b1 \u03ba\u03ac\u03bd\u03bf\u03c5\u03bc\u03b5 \u03c0\u03c1\u03bf\u03b2\u03bb\u03ad\u03c8\u03b5\u03b9\u03c2 \u03b3\u03b9\u03b1 \u03bc\u03b9\u03b1 \u03c4\u03ac\u03c3\u03b7 \u03c0\u03bf\u03c5 \u03b5\u03c1\u03b5\u03c5\u03bd\u03bf\u03cd\u03bc\u03b5. \u03a5\u03c0\u03ac\u03c1\u03c7\u03bf\u03c5\u03bd \u03b4\u03cd\u03bf \u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b5\u03c2 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03c9\u03bd \u03c4\u03b1&nbsp;supervised&nbsp;\u03c0\u03bf\u03c5 \u03c4\u03b1 \u03b5\u03ba\u03c0\u03b1\u03b9\u03b4\u03b5\u03cd\u03bf\u03c5\u03bc\u03b5 \u03b5\u03bc\u03b5\u03af\u03c2 \u03ba\u03b1\u03b9 \u03c3\u03c4\u03b1 unsupervised \u03c0\u03bf\u03c5 \u03b3\u03af\u03bd\u03b5\u03c4\u03b1\u03b9 \u03bc\u03ad\u03c3\u03c9 neural networks. \u03a3\u03c4\u03bf \u03ac\u03c1\u03b8\u03c1\u03bf \u03b8\u03b1 \u03b1\u03c3\u03c7\u03bf\u03bb\u03b7\u03b8\u03bf\u03cd\u03bc\u03b5 \u03bc\u03b5 \u03c4\u03b7\u03bd \u03c0\u03c1\u03ce\u03c4\u03b7 \u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b1 \u03c0\u03bf\u03c5 \u03ad\u03c7\u03b5\u03b9 3 \u03c5\u03c0\u03bf\u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b5\u03c2&nbsp;regression, classification \u03ba\u03b1\u03b9 [&hellip;]","og_url":"https:\/\/www.dataplatform.gr\/en\/ti-einai-ta-montela-sto-data-science\/","og_site_name":"DataPlatform.gr","article_publisher":"https:\/\/www.facebook.com\/dataplatform.gr\/","article_published_time":"2020-10-19T04:00:00+00:00","article_modified_time":"2025-06-12T15:11:20+00:00","og_image":[{"width":1280,"height":720,"url":"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/dp_datascience.png","type":"image\/png"}],"author":"Stratos Matzouranis","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Stratos Matzouranis","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/#article","isPartOf":{"@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/"},"author":{"name":"Stratos Matzouranis","@id":"https:\/\/www.dataplatform.gr\/#\/schema\/person\/e87bf4fd02b65cb6aa0942f87245bbaf"},"headline":"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science","datePublished":"2020-10-19T04:00:00+00:00","dateModified":"2025-06-12T15:11:20+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/"},"wordCount":186,"commentCount":0,"publisher":{"@id":"https:\/\/www.dataplatform.gr\/#organization"},"image":{"@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/dp_datascience.png","keywords":["Data Analysis","Python"],"articleSection":["Data Science &amp; Ai","Python"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/","url":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/","name":"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science - DataPlatform.gr","isPartOf":{"@id":"https:\/\/www.dataplatform.gr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/#primaryimage"},"image":{"@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/dp_datascience.png","datePublished":"2020-10-19T04:00:00+00:00","dateModified":"2025-06-12T15:11:20+00:00","breadcrumb":{"@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/#primaryimage","url":"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/dp_datascience.png","contentUrl":"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/dp_datascience.png","width":1280,"height":720},{"@type":"BreadcrumbList","@id":"https:\/\/www.dataplatform.gr\/ti-einai-ta-montela-sto-data-science\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u0391\u03c1\u03c7\u03b9\u03ba\u03ae","item":"https:\/\/www.dataplatform.gr\/"},{"@type":"ListItem","position":2,"name":"Data Science &amp; Ai","item":"https:\/\/www.dataplatform.gr\/category\/datascience_ai\/"},{"@type":"ListItem","position":3,"name":"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03c4\u03bf Data Science"}]},{"@type":"WebSite","@id":"https:\/\/www.dataplatform.gr\/#website","url":"https:\/\/www.dataplatform.gr\/","name":"dataplatform.gr - Sky is not the limit!","description":"\u0398\u03b5\u03c9\u03c1\u03af\u03b1, \u03bf\u03b4\u03b7\u03b3\u03bf\u03af \u03ba\u03b1\u03b9 \u03c3\u03ba\u03ad\u03c8\u03b5\u03b9\u03c2 \u03b3\u03b9\u03b1 \u03bd\u03b1 \u03ba\u03ac\u03bd\u03b5\u03c4\u03b5 \u03c4\u03b7 \u03b4\u03bf\u03c5\u03bb\u03b5\u03b9\u03ac \u03c3\u03b1\u03c2 \u03c0\u03b9\u03bf \u03c0\u03b1\u03c1\u03b1\u03b3\u03c9\u03b3\u03b9\u03ba\u03ac \u03ba\u03b1\u03b9 \u03c0\u03b9\u03bf \u03b5\u03cd\u03ba\u03bf\u03bb\u03b1 \u03c0\u03ac\u03bd\u03c9 \u03c3\u03c4\u03b9\u03c2 \u03b2\u03ac\u03c3\u03b5\u03b9\u03c2 \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03c9\u03bd, \u03c3\u03c4\u03b7\u03bd SQL, \u03c3\u03c4\u03bf Business Intelligence \u03ba\u03b1\u03b9 \u03c3\u03c4\u03b1 \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03b1 \u03b3\u03b5\u03bd\u03b9\u03ba\u03cc\u03c4\u03b5\u03c1\u03b1.","publisher":{"@id":"https:\/\/www.dataplatform.gr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dataplatform.gr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.dataplatform.gr\/#organization","name":"dataplatform.gr","url":"https:\/\/www.dataplatform.gr\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dataplatform.gr\/#\/schema\/logo\/image\/","url":"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/dp_logo_wbacki.png","contentUrl":"https:\/\/www.dataplatform.gr\/wp-content\/uploads\/2020\/06\/dp_logo_wbacki.png","width":322,"height":139,"caption":"dataplatform.gr"},"image":{"@id":"https:\/\/www.dataplatform.gr\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/dataplatform.gr\/","https:\/\/www.linkedin.com\/company\/dataplatform-gr\/"]},{"@type":"Person","@id":"https:\/\/www.dataplatform.gr\/#\/schema\/person\/e87bf4fd02b65cb6aa0942f87245bbaf","name":"Stratos Matzouranis","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/ab973bc4bd1673c43d45de5633a624d9ad13c06902dfdd5a6e3fd9885903865e?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/ab973bc4bd1673c43d45de5633a624d9ad13c06902dfdd5a6e3fd9885903865e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ab973bc4bd1673c43d45de5633a624d9ad13c06902dfdd5a6e3fd9885903865e?s=96&d=mm&r=g","caption":"Stratos Matzouranis"},"sameAs":["https:\/\/www.dataplatform.gr"],"url":"https:\/\/www.dataplatform.gr\/en\/author\/stratos-matzouranis\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/posts\/1022","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/comments?post=1022"}],"version-history":[{"count":1,"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/posts\/1022\/revisions"}],"predecessor-version":[{"id":5746,"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/posts\/1022\/revisions\/5746"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/media\/692"}],"wp:attachment":[{"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/media?parent=1022"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/categories?post=1022"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dataplatform.gr\/en\/wp-json\/wp\/v2\/tags?post=1022"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}