Monday, January 27, 2020
Beer as a social drink and its subsequent acceptance across the globe.
Beer as a social drink and its subsequent acceptance across the globe. Introduction The study will start with looking at the evolution of beer as a social drink and its subsequent acceptance across the globe. The study will also investigate how different brands came in to channelize communication of beer and incorporated community activities like football, rugby and food as moments for beer consumption to increase product acceptance. We will also look at various international festivals like the Oktoberfest and the Great British Beer festival in UK which have developed as a part of beer culture and helped in the spread of the product through replicated festivals in various parts of the world. Primary research will be done through online surveys and interviews with respondents across Europe, North and South America, and Asia to understand consumer attitudes towards beer in these regions and a comparative analysis will be done on their responses. Based on the insights, the study will investigate whether a similar model can be replicated in India for the nascent beer industry under the following heads: Which of the marketing and communication strategies used in other countries would / wouldnt work in India, and why? Opportunities for replication of festival models from other countries. Implications for the Indian beer manufacturers and marketers, based on a comparative analysis of beer positioning and communication in different cultures. Literature Review A preliminary study of literature on beer industry globally and consumer behaviour revealed the following salient points: * Research shows that beer is a non-food specific drink compared to wine. It is more of a masculine and non-formal occasion drink and associated with fun and social events. When it comes to different brands of beer, it is important for them to focus on positioning and consumer engagement. Beer has slowly become more fashionable to drink with its association with activities like football, rugby and rock music. * Peer pressure plays a huge role on the consumption of alcohol. Alcohol is associated with a list of values which are belonging, excitement, warm relationships, self-fulfilment, well-respected, fun and enjoyment, security, self-respect, and sense of accomplishment. These are important cues for any company while designing their marketing and communication strategy for their brand. * In America, beer joints stress on forming communities through engaging events and activities. The ambience of the place is also very critical since beer consumption is all about having a good time. * Forming a connect with the brand is also an important parameter when it comes to selling beer. As quoted by Mike Bristol, owner-founder of Bristol Brewing Co. in Colorado Springs a lot more people want to spend on a company that they have some common association with. Theyre local, theyre in the community, and theyre visible. Import beers dont seem to be doing well in theirr market or nationally, and he think thats a shift. Beer is also seen as a product, consumption of which does not go down even in economic crisis times. * As per Culinary Currents, Beer, Wine and Spirits. (2008, September 15). Nations Restaurant News, some myths about beer are: Dark beer is heavy Ale is stronger than lager Stout is a meal in a glass Imported beer is better than domestic beer Wine is more complex than beer Fruit beers are girly beers All beer is best served ice-cold Beer and fine dining dont mix à · Some craft beer makers have also tried to mix beer with specially crafted menu like cheese and seasonal food. The restaurants have even started experimenting with beer to create cocktails to increase penetration and frequency of beer consumption. This, though, could dilute the product personality of beer which does not reflect classy, fine dining experience but a more rugged and aggressive environment. This food and beer mix is primarily targeted towards non-regular beer drinkers and first timers. à · Some stats from the US market for March-April 2008 reveal interesting facts (Category Insight, Beverage: Beer Demographics. (2008, April). RETAIL MERCHANDISER, 10.) Beer was the fourth largest in terms of US dollar sales in edible grocery supermarket category. 37% of US adults are regular beer consumers 52% of total beer drinkers are age 28 to 49, but versus their size in the beer-consuming population, 41% of beer volume is consumed by 21 to 27 year olds 32% of beer drinkers shop for beer one to three times each week 47% of beer shoppers buy wine in addition to beer; 41% add spirits 70% are male 84% are White, 10% Latino, 6% African American 59% have an income above $59,000 Grocery is the most common beer channel choice at 46% When available, shoppers overwhelmingly prefer to buy cold beer * A brand study in one of the highest beer consuming nations of the world, Czech Republic and Britain suggests that branding played an important role in the development of the organised beer market in these countries at a national level. The entire system was well structured with organisational hierarchies in place as well as streamlined distribution channels. The regional brands on the other hand, do not follow a very structured nation-wide campaign. Although the brand development in case of national brands in both these countries are at similar levels, as we go down the bracket, the branding of regional markets in Czech becomes lesser developed compared to Britain. For the Czech consumers unlike British, brands were not a consideration in making the choice for public houses as much as the taste and freshness of the beer was. To sustain these smaller breweries, a rule was enacted in which the local public houses were obliged to sell the product from the local breweries restricting t he entry of national brands into these places. This rule though, is not present in Czech Republic making branding more important for them. To keep the beer industry safe in Czech, the breweries have kept the price of their beer lower than the other West European countries. * In 2007, 7 million litres of beer was consumed at the Oktoberfest in Munich in Germany. The biggest cultural context of this festival is the symbolism of equality that is shown as people from all classes and categories sit on the same table to enjoy their beer. The fair is the worlds largest fair which attracts visitors in excess of 7 million from all over the world. Such is the pull of this festival, that similar concepts have been replicated in other countries like Canada, Brazil, USA, and India. * The ââ¬Å"whassupâ⬠campaign by Anheuser-Busch for Bud Light revolutionised beer advertising as it targeted the core group of 21-27 year old males who loved to hang out with friends over sporting events. * The Indian consumer mindset can be divided into the following sub heads: Mind over Matter The Functional over the Ornamental Fear of Tomorrow Enjoying the Ordinary The Desire to Fit In * In UK, beer advertising has been moving towards more engaging media like the internet from televisions to deepen their customers experience. Companies like Stella Artois have invested in multi-layer brand experience which tries to connect more with the customers and at a personal level. * Taking the case of Heineken, a lot of its global success can be attributed to its consistency in quality and uniformity in brand message everywhere. The marketing of Heineken is a combination of global feeling and local execution. * In its 2004 report, Global Status on Alcohol, the World Health Organisation (WHO) estimated there were 2 billion drinkers of alcohol on the planet. Trends suggest that for brands to become bigger, globalisation is the way forward. This becomes slightly easier as the consumers in most developed countries and emerging economies are now well informed and despite the cultural differences, are more open to international brands. * The study of global drinking trends suggests emerging markets have much better growth rates than developed markets where the growth is static. Urbanisation, affluence and influence of mass media is playing a major role in this growth. The availability of alcohol in supermarkets is also driving consumption. Beer stands fourth after carbonated drinks, tea and water in terms of share of throat in the world. The off-premise locations are drivers of volume whereas value drivers are the on-premise outlets. In mature markets, growth will be driven by experiential marketing. Barman and barista in urban areas are acquiring celebrity chef status. * In traditional drinking alcohol essentially signified a males entry into adulthood and was associated with food. In the modern day, drinks have become more of an individuals style statement and identity. It is important now to be seen with the right drink for the right occasion. Communities and association with them has become more important than before. Another newly developing phenomenon is that of post modern drinking where connoisseurship, novelty and exclusivity are taking predominance. Themed drinking associated with specific cultures is also seeing a good interest amongst the travelling class who get exposed to different cultures frequently. Some of the names like Guinness and Scotch whiskey have become iconic as they are seeped deep in the local culture. * According to the Euromonitor report of 2005, the following are the key drivers in the beverage industries in the major countries Australia convenience and health, mature market needing to add value. Alcohol part of the culture Brazil status, sociability and convenience, developing market with opportunities for growth and adding value. Market vulnerable to economic volatility, beer and football key to national culture China affordability, convenience and status in cities, developing market with huge urban potential, rural areas remain largely unchanged France convenience, sociability and status, traditional drinking culture being eroded by changing demands and globalisation Germany price, convenience and health, mature market opportunities to add value. Interest in discounters among affluent and poor Italy sociability, status and health, mature market adapting to changes but traditional infrastructure Japan convenience, status and health, mature market, highly fragmented and source of innovation Russia affordability, convenience, status, high consumption of locally produced spirits as well as increasing presence of global brands in the cities, high beer and vodka consumption. Alcohol dependence an issue among rural male Russians Spain status, sociability and health, directional market in terms of youth drinking trends older drinkers stick to traditional drinking, young driving the post-modern UK convenience, sociability and health, mature market adding value through novelty concentrated retail infrastructure US convenience, sociability and health, mature market adding value through segmentation and premiumisation * A few of the future trends which can be seen in the global drinks industry are health awareness, fusion drinking, artisan brands and connoisseurship experiential marketing and sociability. * Specific to Germany which has the 3rd highest beer per capita consumption in the world, the consumption of beer has been slowly going down. This is attributed to rising prices and the health consciousness of the drinking population. In turn, flavoured beer, non-alcoholic beer and malt-based Ready to Drinks are showing growth in consumption. * A major development in recent years has been the role and involvement of women in purchasing the drinks. Some of the international brands have started targeting women by creating flavoured beers for them. The communication strategy still targets the male predominantly though. * Econometrics study in the US by Franke and Wilcox suggests that there is no significant correlation between the beer advertising and alcohol consumption. All advertising does is make people aware of the brands available but does not really affect the amount of beer consumed overall. A study by Waterson in UK, shows that although advertising spends increased 80% between 1978 to 1987, the actual sale of beer in this period fell by 14%. The study also included Sweden which has banned alcohol advertising since 1979 with similar results. * The April 2009 Euromonitor report on beer shows a global demand of 184.6 billion litres. In the mature markets volumes are declining but in terms of value consumption is increasing. Laws on drinking and driving are encouraging growth of low/non-alcoholic beer and currently it accounts for 2% of global beer market but is showing high growth rate especially in Muslim countries. In Spain, this category already accounts for 20% of beer volumes. There is also a trend of moving away from the conventional beer type to niche segments like wheat beer and craft beer. Dark beer is also seeing a healthy revival in growth. * Specific to India, beer consumption has registered an increase of 700% between the period of 1995-2007. The per capita expenditures on alcohol have grown at twice the rate of the average growth in the rate of expenditure in this period. The average of 24 in the country with affluence, access to mass media and information, lowering of entry barriers and high awareness levels means a goldmine of an opportunity for alcohol companies. Retailing for wine and beer is now allowed in supermarkets on a lot of states thereby reaching out to more potential consumers, especially the women. This has also resulted in more and more urban households stocking alcohol at their homes unlike earlier times. Finally, the major beer manufacturers will have to compete for an expanding but challenging global market, which will ask hard questions of the positions that global players occupy by category, price point and geography. India will form a major part of this strategy shift and it is already visible with the number of beer brands that have entered the Indian market in the past 2 years. All the research done above talks about beer as a part of the popular culture in developed markets. The challenge is to try and suggest a workable strategy for India based on consumer insight to tap the enormous potential that it offers. India today stands at the forefront of this opportunity and hence it is important for these international players to understand the cultural nuances of the Indian consumer before formulating their strategies for the market. Conceptual Framework/Problem Definition India has one of the lowest annual per capita consumption levels of beer in the world, at 1 litre. The biggest international names like InBev/Anheuser-Busch, Heineken and Carlsberg have already started making investments in the market. Carlsberg has already invested close to $ 200 milion in production facilities in the country. The other companies are also entering the market through tie-ups with local players or setting up their own breweries. The growing affluence and increased disposable incomes along with the low average age of Indians presents a huge potential waiting to be tapped by these players. The increased global travel and exposure to western media has led to changing attitudes towards alcohol. This is expected to boost beer sales, while shifting government policy regarding alcohol and reductions in taxes and duties present interesting opportunities for large domestic and multinational players alike. Some of the states have already allowed beer to be sold in supermarket f ormats thus increasing penetration of beer substantially. For international players, the race is on to establish local manufacturing facilities and distribution networks, in order to gain first-mover advantage over other entrants. Currently the Indian market is dominated by local players but lack of other options has a major role to play in this. Curiosity and aspirational value attached to imported beer presents a unique market for these international players. Clear opportunities exist for those companies which are partnering with local companies or setting up their own breweries to get a head start in this dynamic market. At this juncture it is of paramount importance for these companies to get their marketing and communication strategy right. This is all the more important because the Indian market and consumer presents a challenge which is different from any other country in the world. Even within India, the cultural diversity is such that different strategies might be needed for different parts of the country. The current literature reviewed primarily consists of work which has been done in the developed beer markets or talks about projected figures based on empirical data. The biggest gap in such projections is the lack of understanding of the Indian consumer. Launches of a number of successful international products in India backed by such research have failed because of this. This research will try and understand the cultural differences between the Indian beer drinker and the western beer drinker and do a comparative analysis to gain insights which can be used to design the marketing and communications strategy for these international companies. Beer as a product has been successful in developed countries because of the community culture they have created amongst the consumers. The research will help determine key drivers and key characteristics of the Indian beer market. Proposed Research Design The research will be carried out through administering questionnaires to the beer drinking community in urban India as well as respondents in USA, Canada, Germany, UK, Columbia, Brazil, China, France, Poland, Finland, Slovakia, Lithuania and Korea. Detailed interviews will be carried out with some respondents in all these locations through telephonic interview/online interaction to understand the culture of beer consumption there. An analysis will also be done to compare the communication of the top 3 brands of the world in all these countries to see the differences and similarities and how these consumers absorb it. The Indian respondents will then be shown the communication used in all these countries and insights will be taken on their response to each communication. This will give us insights on the cultural differences and similarities between the Indian consumer and their international counterparts. The sample size will consist of at least 10 detailed interviews of international respondents and 10 in depth interviews on Indian consumers. The questionnaires will be administered to 150 beer drinkers in India and 50 based abroad. The sample size of the questionnaire might increase based on the response of the target group. Expected Contribution The study as earlier mentioned will give a deep insight into the mindset of the urban Indian consumer with respect to beer. It will also look at what are the associations that the Indian consumer has with the alcohol industry in terms of perceptions and specifically with beer. Their responses to international communication will be recorded and analysed to define the key drivers and the key characteristics of the Indian market. The final output as mentioned in the introduction would address at the following heads: Which of the marketing and communication strategies used in other countries would / wouldnt work in India, and why? Opportunities for replication of festival models and other community building activities from other countries. Implications for the Indian beer manufacturers and marketers, based on a comparative analysis of beer positioning and communication in different cultures.
Sunday, January 19, 2020
Desi Arnaz :: Essays Papers
Desi Arnaz Cuban bandleader and singer-turned savvy TV mogul who, after his marriage to comedienne Lucille Ball in 1940, parlayed their successful "I Love Lucy" series into the Desilu TV production empire, which in its heyday also produced the successful and highly lucrative "The Untouchables" and "Star Trek" series. *p*Desiderio Alberto Arnaz y de Acha III was born in 1917 to wealthy Cuban landowners. His father was also the mayor of the town they lived in, but that soon changed. At the age of 16, Desi and his mother had to flee to Miami because of Batista's overthrow of the Machado Government in 1933. *P**BR*When Desi arrived in America, it was a struggle for he and his mother. But soon after he arrived, he joined the Siboney Septet at the Roney Plaza. Later, he started working with Xavier Cugat's band in 1937 and later put together his own rhumba band. His youthful good-looks and engaging presence soon won him a featured spot in the 1939 Broadway musical and theatrical version of "Too Many G irls" and the following year he was signed by RKO. On the movie set, he met his future wife, Lucille Ball. Later that year Desi and Lucy eloped to Connecticut and got married in a country club. Arnaz was featured in several films, mostly as a colorful Latin. Joining MGM, he won attention for his sole dramatic role in the war drama, "Bataan" (1942), but gave up films for touring with his successful band. The marriage was subject to the road most of the time and to Lucy's movie career. When the couple came up with the idea for a television series, they fought to do it together to save their marriage. But the network didn't think the television series would work with Desi being Cuban. But that didn't stop Lucy and Desi. In the summer of 1950, they went on tour, performing for live audiences to prove that the show would work. Well, as you know, the rest is television history!*P**BR*Desi made the first 5,000 dollars spent into millions in just four years. He convinced the show's sponsor, Phillip Morris, that Lucy having a baby on the show would give them great publicity. He was right: the birth of Little Ricky drew 44 million viewers (the swearing in of the President that year only drew 22 million), and the story made headlines everywhere across America. With Desi as a successful executive, and head of the couple's production company, DesiLu, Arnaz pioneered a new way of producing TV shows, shooting each episode of I
Saturday, January 11, 2020
Based Data Mining Approach for Quality Control
Classification-Based Data Mining Approach For Quality Control In Wine Production GUIDED BY: | | SUBMITTED BY:| Jayshri Patel| | Hardik Barfiwala| INDEX Sr No| Title| Page No. | 1| Introduction Wine Production| | 2| Objectives| | 3| Introduction To Dataset| | 4| Pre-Processing| | 5| Statistics Used In Algorithms| | 6| Algorithms Applied On Dataset| | 7| Comparison Of Applied Algorithm | | 8| Applying Testing Dataset| | 9| Achievements| | 1.INTRODUCTION TO WINE PRODUCTION * Wine industry is currently growing well in the market since the last decade. However, the quality factor in wine has become the main issue in wine making and selling. * To meet the increasing demand, assessing the quality of wine is necessary for the wine industry to prevent tampering of wine quality as well as maintaining it. * To remain competitive, wine industry is investing in new technologies like data mining for analyzing taste and other properties in wine. Data mining techniques provide more than summary, but valuable information such as patterns and relationships between wine properties and human taste, all of which can be used to improve decision making and optimize chances of success in both marketing and selling. * Two key elements in wine industry are wine certification and quality assessment, which are usually conducted via physicochemical and sensory tests. * Physicochemical tests are lab-based and are used to characterize physicochemical properties in wine such as its density, alcohol or pH values. * Meanwhile, sensory tests such as taste preference are performed by human experts.Taste is a particular property that indicates quality in wine, the success of wine industry will be greatly determined by consumer satisfaction in taste requirements. * Physicochemical data are also found useful in predicting human wine taste preference and classifying wine based on aroma chromatograms. 2. OBJECTIVE * Modeling the complex human taste is an important focus in wine industries. * The main purpose of this study was to predict wine quality based on physicochemical data. * This study was also conducted to identify outlier or anomaly in sample wine set in order to detect ruining of wine. 3. INTRODUCTION TO DATASETTo evaluate the performance of data mining dataset is taken into consideration. The present content describes the source of data. * Source Of Data Prior to the experimental part of the research, the data is gathered. It is gathered from the UCI Data Repository. The UCI Repository of Machine Learning Databases and Domain Theories is a free Internet repository of analytical datasets from several areas. All datasets are in text files format provided with a short description. These datasets received recognition from many scientists and are claimed to be a valuable source of data. * Overview Of Dataset INFORMATION OF DATASET|Title:| Wine Quality| Data Set Characteristics:| Multivariate| Number Of Instances:| WHITE-WINE : 4898 RED-WINE : 1599 | Area:| Business| Attrib ute Characteristic:| Real| Number Of Attribute:| 11 + Output Attribute| Missing Value:| N/A| * Attribute Information * Input variables (based on physicochemical tests) * Fixed Acidity: Amount of Tartaric Acid present in wine. (In mg per liter) Used for taste, feel and color of wine. * Volatile Acidity: Amount of Acetic Acid present in wine. (In mg per liter) Its presence in wine is mainly due to yeast and bacterial metabolism. * Citric Acid: Amount of Citric Acid present in wine. In mg per liter) Used to acidify wine that are too basic and as a flavor additive. * Residual Sugar: The concentration of sugar remaining after fermentation. (In grams per liter) * Chlorides: Level of Chlorides added in wine. (In mg per liter) Used to correct mineral deficiencies in the brewing water. * Free Sulfur Dioxide: Amount of Free Sulfur Dioxide present in wine. (In mg per liter) * Total Sulfur Dioxide: Amount of free and combined sulfur dioxide present in wine. (In mg per liter) Used mainly as pres ervative in wine process. * Density: The density of wine is close to that of water, dry wine is less and sweet wine is higher. In kg per liter) * PH: Measures the quantity of acids present, the strength of the acids, and the effects of minerals and other ingredients in the wine. (In values) * Sulphates: Amount of sodium metabisulphite or potassium metabisulphite present in wine. (In mg per liter) * Alcohol: Amount of Alcohol present in wine. (In percentage) * Output variable (based on sensory data) * Quality (score between 0 and 10) : White Wine : 3 to 9 Red Wine : 3 to 8 4. PRE-PROCESSING * Pre-processing Of Data Preprocessing of the dataset is carried out before mining the data to remove the different lacks of the information in the data source.Following different process are carried out in the preprocessing reasons to make the dataset ready to perform classification process. * Data in the real world is dirty because of the following reason. * Incomplete: Lacking attribute values, lacking certain attributes of interest, or containing only aggregate data. * E. g. Occupation=ââ¬Å"â⬠* Noisy : Containing errors or outliers. * E. g. Salary=ââ¬Å"-10â⬠* Inconsistent : Containing discrepancies in codes or names. * E. g. Age=ââ¬Å"42â⬠Birthday=ââ¬Å"03/07/1997â⬠* E. g. Was rating ââ¬Å"1,2,3â⬠, Now rating ââ¬Å"A, B, Câ⬠* E. g. Discrepancy between duplicate records * No quality data, no quality mining results! Quality decisions must be based on quality data. * Data warehouse needs consistent integration of quality data. * Major Tasks in done in the Data Preprocessing are, * Data Cleaning * Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies. * Data integration * Integration of multiple databases, data cubes, or files. * The dataset provided from given data source is only in one single file. So there is no need for integrating the dataset. * Data transformation * Normalization a nd aggregation * The dataset is in Normalized form because it is in single data file. * Data reduction Obtains reduced representation in volume but produces the same or similar analytical results. * The data volume in the given dataset is not very huge, the procedure of performing different algorithm is easily done on dataset so the reduction of dataset is not needed on the data set * Data discretization * Part of data reduction but with particular importance, especially for numerical data. * Need for Data Preprocessing in wine quality, * For this dataset Data Cleaning is only required in data pre-processing. * Here, NumericToNominal, InterquartileRange and RemoveWithValues filters are used for data pre-processing. * NumericToNominal Filter weka. filters. unsupervised. attribute. NumericToNominal) * A filter for turning numeric attribute into nominal once. * In our dataset, Class attribute ââ¬Å"Qualityâ⬠in both dataset (Red-wine Quality, White-wine Quality) have a type â⬠Å"Numericâ⬠. So after applying this filter, class attribute ââ¬Å"Qualityâ⬠convert into type ââ¬Å"Nominalâ⬠. * And Red-wine Quality dataset have class names 3, 4, 5 â⬠¦ 8 and White-wine Quality dataset have class names 3, 4, 5 â⬠¦ 9. * Because of classification does not apply on numeric type class field, there is a need for this filter. * InterquartileRange Filter (weka. filters. unsupervised. attribute. InterquartileRange) A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute. * Apply this filter for all attribute indices with all default options. * After applying, filter adds two more fields which names are ââ¬Å"Outliersâ⬠and ââ¬Å"ExtremeValueâ⬠. And this fields has two types of label ââ¬Å"Noâ⬠and ââ¬Å"Yesâ⬠. Here ââ¬Å"Yesâ⬠label indicates, there are outliers and extreme values in dataset. * In our dataset, there are 83 extreme values and 125 outliers i n White-wine Quality dataset and 69 extreme values and 94 outliers in Red-wine Quality. * RemoveWithValues Filter (weka. filters. unsupervised. instance.RemoveWithValues) * Filters instances according to the value of an attribute. * This filter has two options which are ââ¬Å"AttributeIndexâ⬠and ââ¬Å"NominalIndicesâ⬠. * AttributeIndex choose attribute to be use for selection and NominalIndices choose range of label indices to be use for selection on nominal attribute. * In our dataset, AttributeIndex is ââ¬Å"lastâ⬠and NominalIndex is also ââ¬Å"lastâ⬠, so It will remove first 83 extreme values and then 125 outliers in White-wine Quality dataset and 69 extreme values and 94 outliers in Red-wine Quality. * After applying this filter on dataset remove both fields from dataset. * Attribute SelectionRanking Attributes Using Attribute Selection Algorithm| RED-WINE| RANKED| WHITE-WINE| Volatile_Acidity(2)| 0. 1248| 0. 0406| Volatile_Acidity(2)| Total_sulfer_Diox ide(7)| 0. 0695| 0. 0600| Citric_Acidity(3)| Sulphates(10)| 0. 1464| 0. 0740| Chlorides(5)| Alcohal(11)| 0. 2395| 0. 0462| Free_Sulfer_Dioxide(6)| | | 0. 1146| Density(8)| | | 0. 2081| Alcohal(11)| * The selection of attributes is performed automatically by WEKA using Info Gain Attribute Eval method. * The method evaluates the worth of an attribute by measuring the information gain with respect to the class. 5. STATISTICS USED IN ALGORITHMS * Statistics MeasuresThere are Different algorithms that can be used while performing data mining on the different dataset using weka, some of them are describe below with the different statistics measures. * Statistics Used In Algorithms * Kappa statistic * The kappa statistic, also called the kappa coefficient, is a performance criterion or index which compares the agreement from the model with that which could occur merely by chance. * Kappa is a measure of agreement normalized for chance agreement. * Kappa statistic describe that our predicti on for class attribute for given dataset is how much near to actual values. * Values Range For Kappa Range| Result| lt;0| POOR| 0-0. 20| SLIGHT| 0. 21-0. 40| FAIR| 0. 41-0. 60| MODERATE| 0. 61-0. 80| SUBSTANTIAL| 0. 81-1. 0| ALMOST PERFECT| * As above range in weka algorithm evaluation if value of kappa is near to 1 then our predicted values are accurate to actual values so, applied algorithm is accurate. Kappa Statistic Values For Wine Quality DataSet| Algorithm| White-wine Quality| Red-wine Quality| K-Star| 0. 5365| 0. 5294| J48| 0. 3813| 0. 3881| Multilayer Perceptron| 0. 2946| 0. 3784| * Mean absolute error (MAE) * Mean absolute error (MAE)à is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error is given by, Mean absolute Error For Wine Quality DataSet| Algorithm| White-wine Quality| Red-wine Quality| K-Star| 0. 1297| 0. 1381| J48| 0. 1245| 0. 1401| Multilayer Perceptron| 0. 1581| 0. 1576| * Root Mean Squared Erro r * If you have some data and try to make a curve (a formula) fit them, you can graph and see how close the curve is to the points. Another measure of how well the curve fits the data is Root Mean Squared Error. * For each data point, CalGraph calculates the value ofà à y from the formula. It subtracts this from the data's y-value and squares the difference. All these squares are added up and the sum is divided by the number of data. * Finally CalGraph takes the square root. Written mathematically, Root Mean Square Error is Root Mean Squared Error For Wine Quality DataSet| Algorithm| White-wine Quality| Red-wine Quality| K-Star| 0. 2428| 0. 2592| J48| 0. 3194| 0. 3354| Multilayer Perceptron| 0. 2887| 0. 3023| * Root Relative Squared Error * Theà root relative squared errorà is relative to what it would have been if a simple predictor had been used. More specifically, this simple predictor is just the average of the actual values. Thus, the relative squared error takes the to tal squared error and normalizes it by dividing by the total squared error of the simple predictor. * By taking the square root of therelative squared errorà one reduces the error to the same dimensions as the quantity being predicted. * Mathematically, theà root relative squared errorà Eià of an individual programà ià is evaluated by the equation: * whereà P(ij)à is the value predicted by the individual programà ià for sample caseà jà (out ofà nà sample cases);à Tjà is the target value for sample caseà j; andis given by the formula: * For a perfect fit, the numerator is equal to 0 andà Eià = 0.So, theà Eià index ranges from 0 to infinity, with 0 corresponding to the ideal. Root Relative Squared Error For Wine Quality DataSet| Algorithm| White-wine Quality| Red-wine Quality| K-Star| 78. 1984 %| 79. 309 %| J48| 102. 9013 %| 102. 602 %| Multilayer Perceptron| 93. 0018 %| 92. 4895 %| * Relative Absolute Error * Theà relative absolute errorà is very similar to theà relative squared errorà in the sense that it is also relative to a simple predictor, which is just the average of the actual values. In this case, though, the error is just the total absolute error instead of the total squared error. Thus, the relative absolute error takes the total absolute error and normalizes it by dividing by the total absolute error of the simple predictor. Mathematically, theà relative absolute errorà Eià of an individual programà ià is evaluated by the equation: * whereà P(ij)à is the value predicted by the individual programà ià for sample caseà jà (out ofà nà sample cases);à Tjà is the target value for sample caseà j; andis given by the formula: * For a perfect fit, the numerator is equal to 0 andà Eià = 0. So, theà Eià index ranges from 0 to infinity, with 0 corresponding to the ideal.Relative Absolute Squared Error For Wine Quality DataSet| Algorithm| White-wine Quality| Red-wine Quality | K-Star| 67. 2423 %| 64. 5286 %| J48| 64. 577 %| 65. 4857 %| Multilayer Perceptron| 81. 9951 %| 73. 6593 %| * Various Rates * There are four possible outcomes from a classifier. * If the outcome from a prediction isà pà and the actual value is alsoà p, then it is called aà true positiveà (TP). * However if the actual value isà nà then it is said to be aà false positiveà (FP). * Conversely, aà true negativeà (TN) has occurred when both the prediction outcome and the actual value areà n. Andà false negativeà (FN) is when the prediction outcome isà n while the actual value isà p. * Absolute Value | P| N| TOTAL| pââ¬â¢| True positive| false positive| Pââ¬â¢| nââ¬â¢| false negative| True negative| Nââ¬â¢| Total| P| N| | * ROC Curves * While estimating the effectiveness and accuracy of data mining technique it is essential to measure the error rate of each method. * In the case of binary classification tasks the error rate takes and components under consideration. * The ROC analysis which stands for Receiver Operating Characteristics is applied. * The sample ROC curve is presented in the Figure below.The closer the ROC curve is to the top left corner of the ROC chart the better the performance of the classifier. * Sample ROC curve (squares with the usage of the model, triangles without). The line connecting the square with triage is the benefit from the usage of the model. * It plots the curve which consists of x-axis presenting false positive rate and y-axis which plots the true positive rate. This curve model selects the optimal model on the basis of assumed class distribution. * The ROC curves are applicable e. g. in decision tree models or rule sets. * Recall, Precision and F-Measure There are four possible results of classification. * Different combination of these four error and correct situations are presented in the scientific literature on topic. * Here three popular notions are presented. The introduction of the se classifiers is explained by the possibility of high accuracy by negative type of data. * To avoid such situation recall and precision of the classification are introduced. * The F measure is the harmonic mean of precision and recall. * The formal definitions of these measures are as follow : PRECSION = TPTP+FP RECALL = TPTP+FNF-Measure = 21PRECSION+1RECALL * These measures are introduced especially in information retrieval application. * Confusion Matrix * A matrix used to summarize the results of a supervised classification. * Entries along the main diagonal are correct classifications. * Entries other than those on the main diagonal are classification errors. 6. ALGORITHMS * K-Nearest Neighbor Classifiers * Nearest neighbor classifiers are based on learning by analogy. * The training samples are described by n-dimensional numeric attributes. Each sample represents a point in an n-dimensional space. In this way, all of the training samples are stored in an n-dimensional pattern space. When given an unknown sample, a k-nearest neighbor classifier searches the pattern space for the k training samples that are closest to the unknown sample. * These k training samples are the k-nearest neighbors of the unknown sample. ââ¬Å"Closenessâ⬠is defined in terms of Euclidean distance, where the Euclidean distance between two points, , * The unknown sample is assigned the most common class among its k nearest neighbors. When k = 1, the unknown sample is assigned the class of the training sample that is closest to it in pattern space. Nearest neighbor classifiers are instance-based or lazy learners in that they store all of the training samples and do not build a classifier until a new (unlabeled) sample needs to be classified. * Lazy learners can incur expensive computational costs when the number of potential neighbors (i. e. , stored training samples) with which to compare a given unlabeled sample is great. * Therefore, they require efficient indexing techniqu es. As expected, lazy learning methods are faster at training than eager methods, but slower at classification since all computation is delayed to that time.Unlike decision tree induction and back propagation, nearest neighbor classifiers assign equal weight to each attribute. This may cause confusion when there are many irrelevant attributes in the data. * Nearest neighbor classifiers can also be used for prediction, i. e. to return a real-valued prediction for a given unknown sample. In this case, the classifier returns the average value of the real-valued labels associated with the k nearest neighbors of the unknown sample. * In weka the previously described algorithm nearest neighbor is given as Kstar algorithm in classifier -> lazy tab. The Result Generated After Applying K-Star On White-wine Quality Dataset Kstar Options : -B 70 -M a | Time Taken To Build Model: 0. 02 Seconds| Stratified Cross-Validation (10-Fold)| * Summary | Correctly Classified Instances | 3307 | 70. 6624 % | Incorrectly Classified Instances| 1373 | 29. 3376 %| Kappa Statistic | 0. 5365| | Mean Absolute Error | 0. 1297| | Root Mean Squared Error| 0. 2428| | Relative Absolute Error | 67. 2423 %| | Root Relative Squared Error | 78. 1984 %| | Total Number Of Instances | 4680 | | * Detailed Accuracy By Class | TP Rate| FP Rate | Precision | Recall | F-Measure | ROC Area | PRC Area| Class| | 0 | 0 | 0 | 0 | 0 | 0. 583 | 0. 004 | 3| | 0. 211 | 0. 002 | 0. 769 | 0. 211 | 0. 331 | 0. 884 | 0. 405 | 4| | 0. 672 | 0. 079 | 0. 777 | 0. 672 | 0. 721 | 0. 904 | 0. 826 | 5| | 0. 864 | 0. 378 | 0. 652 | 0. 864 | 0. 743 | 0. 84 | 0. 818 | 6| | 0. 536 | 0. 031 | 0. 797 | 0. 536 | 0. 641 | 0. 911 | 0. 772 | 7| | 0. 398 | 0. 002 | 0. 883 | 0. 398 | 0. 548 | 0. 913 | 0. 572 | 8| | 0 | 0 | 0 | 0 | 0 | 0. 84 | 0. 014 | 9| Weighted Avg. | 0. 707 | 0. 2 | 0. 725 | 0. 707 | 0. 695 | 0. 876 | 0. 787| | * Confusion Matrix| A | B | C | D | E | F| G | | Class| 0 | 0 | 4 | 9 | 0| 0 | 0 | | | A=3| 0| 30| 49| 62| 1 | 0 | 0| | | B=4| 0 | 7 | 919| 437| 5 | 0 | 0 | | | C=5| 0 | 2 | 201| 1822| 81 | 2 | 0 | || D=6| 0 | 0 | 9 | 389 | 468 | 7 | 0| || E=7| 0 | 0 | 0 | 73 | 30 | 68 | 0 | || F=8| 0 | 0 | 0 | 3 | 2 | 0 | 0 | || G=9| * Performance Of The Kstar With Respect To A Testing Configuration For The White-wine Quality DatasetTesting Method| Training Set| Testing Set| 10-Fold Cross Validation| 66% Split| Correctly Classified Instances| 99. 6581 %| 100 %| 70. 6624 %| 63. 9221 %| Kappa statistic| 0. 9949| 1| 0. 5365| 0. 4252| Mean Absolute Error| 0. 0575| 0. 0788| 0. 1297| 0. 1379| Root Mean Squared Error| 0. 1089| 0. 145| 0. 2428| 0. 2568| Relative Absolute Error| 29. 8022 %| | 67. 2423 %| 71. 2445 %| * The Result Generated After Applying K-Star On Red-wine Quality Dataset Kstar Options : -B 70 -M a | Time Taken To Build Model: 0 Seconds| Stratified Cross-Validation (10-Fold)| * Summary | Correctly Classified Instances | 1013 | 71. 379 %| Incorrectly Classified Instances| 413 | 28. 9621 %| Kappa Stat istic | 0. 5294| | Mean Absolute Error | 0. 1381| | Root Mean Squared Error | 0. 2592| | Relative Absolute Error | 64. 5286 %| | Root Relative Squared Error | 79. 309 %| | Total Number Of Instances | 1426 | | * Detailed Accuracy By Class | | TP Rate | FP Rate | Precision | Recall | F-Measure | ROC Area | PRC Area| Class| | 0 | 0. 001 | 0 | 0 | 0 | 0. 574 | 0. 019 | 3| | 0 | 0. 003 | 0 | 0 | 0 | 0. 811 | 0. 114 | 4| | 0. 791| 0. 176 | 0. 67| 0. 791| 0. 779 | 0. 894 | 0. 867 | 5| | 0. 769 | 0. 26 | 0. 668 | 0. 769 | 0. 715 | 0. 834 | 0. 788 | 6| | 0. 511 | 0. 032 | 0. 692 | 0. 511 | 0. 588 | 0. 936 | 0. 722 | 7| | 0. 125 | 0. 001 | 0. 5 | 0. 125 | 0. 2 | 0. 896 | 0. 142 | 8| Weighted Avg. | 0. 71| 0. 184| 0. 685| 0. 71| 0. 693| 0. 871| 0. 78| | * Confusion Matrix | A | B | C | D | E | F| | Class| 0 | 1 | 4| 1 | 0 | 0 | | | A=3| 1 | 0 | 30| 17 | 0 | 0| | | B=4| 0 | 2| 477| 120 | 4 | 0| | | C=5| 0 | 1 | 103 | 444| 29 | 0| || D=6| 0 | 0 | 8 | 76 | 90 | 2 | || E=7| 0 | 0 | 0 | 7 | 7 | 2| || F=8| Performance Of The Kstar With Respect To A Testing Configuration For The Red-wine Quality Dataset Testing Method| Training Set| Testing Set| 10-Fold Cross Validation| 66% Split| Correctly Classified Instances| 99. 7895 %| 100 % | 71. 0379 %| 70. 7216 %| Kappa statistic| 0. 9967| 1| 0. 5294| 0. 5154| Mean Absolute Error| 0. 0338| 0. 0436| 0. 1381| 0. 1439| Root Mean Squared Error| 0. 0675| 0. 0828 | 0. 2592| 0. 2646| Relative Absolute Error| 15. 8067 %| | 64. 5286 %| 67. 4903 %| * J48 Decision Tree * Class for generating a pruned or unpruned C4. 5 decision tree. A decision tree is a predictive machine-learning model that decides the target value (dependent variable) of a new sample based on various attribute values of the available data. * The internal nodes of a decision tree denote the different attribute; the branches between the nodes tell us the possible values that these attributes can have in the observed samples, while the terminal nodes tell us the final value (class ification) of the dependent variable. * The attribute that is to be predicted is known as the dependent variable, since its value depends upon, or is decided by, the values of all the other attributes.The other attributes, which help in predicting the value of the dependent variable, are known as the independent variables in the dataset. * The J48 Decision tree classifier follows the following simple algorithm: * In order to classify a new item, it first needs to create a decision tree based on the attribute values of the available training data. So, whenever it encounters a set of items (training set) it identifies the attribute that discriminates the various instances most clearly. * This feature that is able to tell us most about the data instances so that we can classify them the best is said to have the highest information gain. Now, among the possible values of this feature, if there is any value for which there is no ambiguity, that is, for which the data instances falling wi thin its category have the same value for the target variable, then we terminate that branch and assign to it the target value that we have obtained. * For the other cases, we then look for another attribute that gives us the highest information gain. Hence we continue in this manner until we either get a clear decision of what combination of attributes gives us a particular target value, or we run out of attributes.In the event that we run out of attributes, or if we cannot get an unambiguous result from the available information, we assign this branch a target value that the majority of the items under this branch possess. * Now that we have the decision tree, we follow the order of attribute selection as we have obtained for the tree. By checking all the respective attributes and their values with those seen in the decision tree model, we can assign or predict the target value of this new instance. * The Result Generated After Applying J48 On White-wine Quality Dataset Time Taken To Build Model: 1. 4 Seconds| Stratified Cross-Validation (10-Fold) | * Summary| | | Correctly Classified Instances| 2740 | 58. 547 %| Incorrectly Classified Instances | 1940 | 41. 453 %| Kappa Statistic | 0. 3813| | Mean Absolute Error | 0. 1245| | Root Mean Squared Error | 0. 3194| | Relative Absolute Error | 64. 5770 %| | Root Relative Squared Error| 102. 9013 %| | Total Number Of Instances | 4680| | * Detailed Accuracy By Class| | TP Rate| FP Rate| Precision| Recall| F-Measure| ROC Area| Class| | 0| 0. 002| 0| 0| 0| 0. 30| 3| | 0. 239| 0. 020| 0. 270| 0. 239| 0. 254| 0. 699| 4| | 0. 605| 0. 169| 0. 597| 0. 605| 0. 601| 0. 763| 5| | 0. 644| 0. 312| 0. 628| 0. 644| 0. 636| 0. 689| 6| | 0. 526| 0. 099| 0. 549| 0. 526| 0. 537| 0. 766| 7| | 0. 363| 0. 022| 0. 388| 0. 363| 0. 375| 0. 75| 8| | 0| 0| 0| 0| 0| 0. 496| 9| Weighted Avg. | 0. 585 | 0. 21 | 0. 582 | 0. 585 | 0. 584 | 0. 727| | * Confusion Matrix | A| B| C| D| E| F| G| || Class| 0| 2| 6| 5| 0| 0| 0| || A=3| 1| 34| 55| 44| 6| 2| 0| || B=4| 5| 50| 828| 418| 60| 7| 0| || C=5| 2| 32| 413| 1357| 261| 43| 0| || D=6| | 7| 76| 286| 459| 44| 0| || E=7| 1| 1| 10| 49| 48| 62| 0| || F=8| 0| 0| 0| 1| 2| 2| 0| || G=9| * Performance Of The J48 With Respect To A Testing Configuration For The White-wine Quality Dataset Testing Method| Training Set| Testing Set| 10-Fold Cross Validation| 66% Split| Correctly Classified Instances| 90. 1923 %| 70 %| 58. 547 %| 54. 8083 %| Kappa statistic| 0. 854| 0. 6296| 0. 3813| 0. 33| Mean Absolute Error| 0. 0426| 0. 0961| 0. 1245| 0. 1347| Root Mean Squared Error| 0. 1429| 0. 2756| 0. 3194| 0. 3397| Relative Absolute Error| 22. 0695 %| | 64. 577 %| 69. 84 %| * The Result Generated After Applying J48 On Red-wine Quality Dataset Time Taken To Build Model: 0. 17 Seconds| Stratified Cross-Validation| * Summary| Correctly Classified Instances | 867 | 60. 7994 %| Incorrectly Classified Instances | 559 | 39. 2006 %| Kappa Statistic | 0. 3881| | Mean Absolute Error | 0. 1401| | Root Mean Squa red Error | 0. 3354| | Relative Absolute Error | 65. 4857 %| | Root Relative Squared Error | 102. 602 %| |Total Number Of Instances | 1426 | | * Detailed Accuracy By Class| | Tp Rate | Fp Rate | Precision | Recall | F-measure | Roc Area | Class| | 0 | 0. 004 | 0 | 0 | 0 | 0. 573 | 3| | 0. 063 | 0. 037 | 0. 056 | 0. 063 | 0. 059 | 0. 578 | 4| | 0. 721 | 0. 258 | 0. 672 | 0. 721 | 0. 696 | 0. 749 | 5| | 0. 57 | 0. 238 | 0. 62 | 0. 57 | 0. 594 | 0. 674 | 6| | 0. 563 | 0. 64 | 0. 553 | 0. 563 | 0. 558 | 0. 8 | 7| | 0. 063 | 0. 006 | 0. 1 | 0. 063 | 0. 077 | 0. 691 | 8| Weighted Avg. | 0. 608 | 0. 214 | 0. 606 | 0. 608 | 0. 606 | 0. 718 | | * Confusion Matrix | A | B | C | D | E | F | | Class| 0 | 2 | 1 | 2 | 1 | 0 | | | A=3| 2 | 3 | 25 | 15 | 3 | 0 | | | B=4| 1 | 26 | 435 | 122 | 17 | 2 | | | C=5| 2 | 21 | 167 | 329 | 53 | 5 | | | D=6| 0 | 2 | 16 | 57 | 99 | 2 | | | E=7| 0 | 0 | 3 | 6 | 6 | 1 | | | F=8| Performance Of The J48 With Respect To A Testing Configuration For The Red-wine Qual ity Dataset Testing Method| Training Set| Testing Set| 10-Fold Cross Validation| 66% Split| Correctly Classified Instances| 91. 1641 %| 80 %| 60. 7994 %| 62. 4742 %| Kappa statistic| 0. 8616| 0. 6875| 0. 3881| 0. 3994| Mean Absolute Error| 0. 0461| 0. 0942| 0. 1401| 0. 1323| Root Mean Squared Error| 0. 1518| 0. 2618| 0. 3354| 0. 3262| Relative Absolute Error| 21. 5362 %| 39. 3598 %| 65. 4857 %| 62. 052 %| * Multilayer Perceptron * The back propagation algorithm performs learning on a multilayer feed-forward neural network. It iteratively learns a set of weights for prediction of the class label of tuples. * A multilayer feed-forward neural network consists of an input layer, one or more hidden layers, and an output layer. * Each layer is made up of units. The inputs to the network correspond to the attributes measured for each training tuple. The inputs are fed simultaneously into the units making up the input layer. These inputs pass through the input layer and are then weighted an d fed simultaneously to a second layer of ââ¬Å"neuronlikeâ⬠units, known as a hidden layer. The outputs of the hidden layer units can be input to another hidden layer, and so on. The number of hidden layers is arbitrary, although in practice, usually only one is used. The weighted outputs of the last hidden layer are input to units making up the output layer, which emits the networkââ¬â¢s prediction for given tuples. * The units in the input layer are called input units. The units in the hidden layers and output layer are sometimes referred to as neurodes, due to their symbolic biological basis, or as output units. * The network is feed-forward in that none of the weights cycles back to an input unit or to an output unit of a previous layer.It is fully connected in that each unit provides input to each unit in the next forward layer. * The Result Generated After Applying Multilayer Perceptron On White-wine Quality Dataset Time taken to build model: 36. 22 seconds| Stratifi ed cross-validation| * Summary| Correctly Classified Instances | 2598 | 55. 5128 %| Incorrectly Classified Instances | 2082 | 44. 4872 %| Kappa statistic | 0. 2946| | Mean absolute error | 0. 1581| | Root mean squared error | 0. 2887| |Relative absolute error | 81. 9951 %| | Root relative squared error | 93. 0018 %| | Total Number of Instances | 4680 | | * Detailed Accuracy By Class | | TP Rate | FP Rate | Precision | Recall | F-Measure | ROC Area | PRC Area | Class| | 0 | 0 | 0 | 0 | 0 | 0. 344 | 0. 002 | 3| | 0. 056 | 0. 004 | 0. 308 | 0. 056 | 0. 095 | 0. 732 | 0. 156 | 4| | 0. 594 | 0. 165 | 0. 597 | 0. 594 | 0. 595 | 0. 98 | 0. 584 | 5| | 0. 704 | 0. 482 | 0. 545 | 0. 704 | 0. 614 | 0. 647 | 0. 568 | 6| | 0. 326 | 0. 07 | 0. 517 | 0. 326 | 0. 4 | 0. 808 | 0. 474 | 7| | 0. 058 | 0. 002 | 0. 5 | 0. 058 | 0. 105 | 0. 8 | 0. 169 | 8| | 0 | 0 | 0| 0 | 0 | 0. 356 | 0. 001 | 9| Weighted Avg. | 0. 555 | 0. 279 | 0. 544 | 0. 555 | 0. 532 | 0. 728 | 0. 526| | * Confusion Matrix |A | B | C | D | E | F | G | | Class| 0 | 0 | 5 | 7 | 1 | 0 | 0 | | | A=3| 0 | 8 | 82 | 50 | 2 | 0 | 0 | | | B=4| 0 | 11 | 812 | 532 | 12 | 1 | 0 | | | C=5| 0 | 6 | 425 | 1483 | 188 | 6 | 0 | | | D=6| 0 | 1 | 33 | 551 | 285 | 3 | 0 | | | E=7| 0 | 0 | 3 | 98 | 60 | 10 | 0 | | | F=8| 0 | 0 | 0 | 2 | 3 | 0 | 0 | | | G=9| * Performance Of The Multilayer perceptron With Respect To A Testing Configuration For The White-wine Quality DatasetTesting Method| Training Set| Testing Set| 10-Fold Cross Validation| 66% Split| Correctly Classified Instances| 58. 1838 %| 50 %| 55. 5128 %| 51. 3514 %| Kappa statistic| 0. 3701| 0. 3671| 0. 2946| 0. 2454| Mean Absolute Error| 0. 1529| 0. 1746| 0. 1581| 0. 1628| Root Mean Squared Error| 0. 2808| 0. 3256| 0. 2887| 02972| Relative Absolute Error| 79. 2713 %| | 81. 9951 %| 84. 1402 %| * The Result Generated After Applying Multilayer Perceptron On Red-wine Quality Dataset Time taken to build model: 9. 14 seconds| Stratified cross-validation (10-Fold)| * Summary | Co rrectly Classified Instances | 880 | 61. 111 %| Incorrectly Classified Instances | 546 | 38. 2889 %| Kappa statistic | 0. 3784| | Mean absolute error | 0. 1576| | Root mean squared error | 0. 3023| | Relative absolute error | 73. 6593 %| | Root relative squared error | 92. 4895 %| | Total Number of Instances | 1426| | * Detailed Accuracy By Class | | TP Rate | FP Rate | Precision | Recall | F-Measure | ROC Area | Class| | 0 | 0 | 0 | 0 | 0 | 0. 47 | 3| | 0. 42 | 0. 005 | 0. 222 | 0. 042 | 0. 070 | 0. 735 | 4| | 0. 723 | 0. 249 | 0. 680 | 0. 723 | 0. 701 | 0. 801 | 5| | 0. 640 | 0. 322 | 0. 575 | 0. 640 | 0. 605 | 0. 692 | 6| | 0. 415 | 0. 049 | 0. 545 | 0. 415 | 0. 471 | 0. 831 | 7| | 0 | 0 | 0 | 0 | 0 | 0. 853 | 8| Weighted Avg. | 0. 617 | 0. 242 | 0. 595 | 0. 617 | 0. 602 | 0. 758| | * Confusion Matrix | A | B | C | D | E | F | | Class| | 0 | 5 | 1 | 0 | 0| || A=3| 0 | 2 | 34 | 11 | 1 | 0 | | | B=4| 0 | 2 | 436 | 160 | 5 | 0 | | | C=5| 0 | 5 | 156 | 369 | 47 | 0 | | | D=6| 0 | 0 | 10 | 93 | 73 | 0 | | | E=7| 0 | 0 | 0 | 8 | 8 | 0 | | | F=8| * Performance Of The Multilayer perceptron With Respect To A Testing Configuration For The Red-wine Quality Dataset Testing Method| Training Set| Testing Set| 10-Fold Cross Validation| 66% Split| Correctly Classified Instances| 68. 7237 %| 70 %| 61. 7111 %| 58. 7629 %| Kappa statistic| 0. 4895| 0. 5588| 0. 3784| 0. 327| Mean Absolute Error| 0. 426| 0. 1232| 0. 1576| 0. 1647| Root Mean Squared Error| 0. 2715| 0. 2424| 0. 3023| 0. 3029| Relative Absolute Error| 66. 6774 %| 51. 4904 %| 73. 6593 %| 77. 2484 %| * Result * The classification experiment is measured by accuracy percentage of classifying the instances correctly into its class according to quality attributes ranges between 0 (very bad) and 10 (excellent). * From the experiments, we found that classification for red wine quality usingà Kstar algorithm achieved 71. 0379 % accuracy while J48 classifier achieved about 60. 7994% and Multilayer Perceptron classifier ac hieved 61. 7111% accuracy. For the white wine, Kstar algorithm yielded 70. 6624 % accuracy while J48 classifier yielded 58. 547% accuracy and Multilayer Perceptron classifier achieved 55. 5128 % accuracy. * Results from the experiments lead us to conclude that Kstar performs better in classification task as compared against the J48 and Multilayer Perceptron classifier. The processing time for Kstar algorithm is also observed to be more efficient and less time consuming despite the large size of wine properties dataset. 7. COMPARISON OF DIFFERENT ALGORITHM * The Comparison Of All Three Algorithm On White-wine Quality Dataset (Using 10-Fold Cross Validation) Kstar| J48| Multilayer Perceptron| Time (Sec)| 0| 1. 08| 35. 14| Kappa Statistics| 0. 5365| 0. 3813| 0. 29| Correctly Classified Instances (%)| 70. 6624| 58. 547| 55. 128| True Positive Rate (Avg)| 0. 707| 0. 585| 0. 555| False Positive Rate (Avg)| 0. 2| 0. 21| 0. 279| * Chart Shows The Best Suited Algorithm For Our Dataset (Measu res Vs Algorithms) * In above chart, comparison of True Positive rate and kappa statistics is given against three algorithm Kstar, J48, Multilayer Perceptron * Chart describes algorithm which is best suits for our dataset. In above chart column of TP rate & Kappa statistics of Kstar algorithm is higher than other two algorithms. * In above chart you can see that the False Positive Rate and the Mean Absolute Error of the Multilayer Perceptron algorithm is high compare to other two algorithms. So it is not good for our dataset. * But for the Kstar algorithm these two values are less, so the algorithm having lowest values for FP Rate & Mean Absolute Error rate is best suited algorithm. * So the final we can make conclusion that the Kstar algorithm is best suited algorithm for White-wine Quality dataset. The Comparison Of All Three Algorithm On Red-wine Quality Dataset (Using 10-Fold Cross Validation) | Kstar| J48| Multilayer Perceptron| Time (Sec)| 0| 0. 24| 9. 3| Kappa Statistics| 0. 5294| 0. 3881| 0. 3784| Correctly Classified Instances (%)| 71. 0379| 60. 6994| 61. 7111| True Positive Rate (Avg)| 0. 71| 0. 608| 0. 617| False Positive Rate (Avg)| 0. 184| 0. 214| 0. 242| * For Red-wine Quality dataset have also Kstar is best suited algorithm , because of TP rate & Kappa statistics of Kstar algorithm is higher than other two algorithms and FP rate & Mean Absolute Error of Kstar algorithm is lower than other algorithms. . APPLYING TESTING DATASET Step1: Load pre-processed dataset. Step2: Go to classify tab. Click on choose button and select lazy folder from the hierarchy tab and then select kstar algorithm. After selecting the kstar algorithm keep the value of cross validation = 10, then build the model by clicking on start button. Step3: Now take any 10 or 15 records from your dataset, make their class value unknown(by putting ââ¬â¢? ââ¬â¢ in the cell of the corresponding raw ) as shown below. Step 4: Save this data set as . rff file. Step 5: From ââ¬Å"tes t optionâ⬠panel select ââ¬Å"supplied test setâ⬠, click on to the set button and open the test dataset file which was lastly created by you from the disk. Step 6: From ââ¬Å"Result list panelâ⬠panel select Kstar-algorithm (because it is better than any other for this dataset), right click it and click ââ¬Å"Re-evaluate model on current test setâ⬠Step 7: Again right click on Kstar algorithm and select ââ¬Å"visualize classifier errorâ⬠Step 8:Click on save button and then save your test model.Step 9: After you had saved your test model, a separate file is created in which you will be having your predicted values for your testing dataset. Step 10: Now, this test model will have all the class value generated by model by re-evaluating model on the test data for all the instances that were set to unknown, as shown in the figure below. 9. ACHIEVEMENT * Classification models may be used as part of decision support system in different stages of wine productio n, hence giving the opportunity for manufacturer to make corrective and additive measure that will result in higher quality wine being produced. From the resulting classification accuracy, we found that accuracy rate for the white wine is influenced by a higher number of physicochemistry attribute, which are alcohol, density, free sulfur dioxide, chlorides, citric acid, and volatile acidity. * Red wine quality is highly correlated to only four attributes, which are alcohol, sulphates, total sulfur dioxide, and volatile acidity. * This shows white wine quality is affected by physicochemistry attributes that does not affect the red wine in general. Therefore, I suggest that white wine manufacturer should conduct wider range of test particularly towards density and chloride content since white wine quality is affected by such substances. * Attribute selection algorithm we conducted also ranked alcohol as the highest in both datasets, hence the alcohol level is the main attribute that d etermines the quality in both red and white wine. * My suggestion is that wine manufacturer to focus in maintaining a suitable alcohol content, may be by longer fermentation period or higher yield fermenting yeast.
Friday, January 3, 2020
Subscribe to:
Comments (Atom)