Saturday, March 28, 2020

Data mining titanic dataset Essay Example

Data mining titanic dataset Paper Titanic dataset Submitted by: Submission date 8/1/2013 Declaration Author: Contents Dated: 29/12/2012 The database corresponds to the sinking of the titanic on April the 15th 1912. It is part of a database containing the passengers and crew who were aboard the ship, and various attributes correlating to them. The purpose of this task is to apply the methodology of CRISP-DMS and follow the phases and tasks of this model. Using the classification method in rapid miner and both the decision tree and INN algorithms, I will create a training model and try apply the class survived or didnt survive. If I apply a decision tree to the dataset as it is, I get a prediction rate of 78%. I will try various techniques throughout this report to increase the overall prediction rate. Data mining objectives: I would like to explore the pre conceived ideas I have about the sinking of the titanic, and prove if they are correct. Was there a majority of 3rd class passengers who died? What was the ratio of passengers who died, male or female? Did the location of cabins make a difference as to who survived? Did chivalry ring through and did Women and children first actually happen? We will write a custom essay sample on Data mining titanic dataset specifically for you for only $16.38 $13.9/page Order now We will write a custom essay sample on Data mining titanic dataset specifically for you FOR ONLY $16.38 $13.9/page Hire Writer We will write a custom essay sample on Data mining titanic dataset specifically for you FOR ONLY $16.38 $13.9/page Hire Writer Data Understanding: Describe the data: Figure Class label: Survive (1 or O) 1 = survived, died. Type = Binomial. Total: 891. Survived: 342, Died: 549 Attributes: 10 attributes 891 rows The dataset have primarily a categorical type of attribute so there is low information content. This might indicate a decision tree would be an appropriate model to use. I can see that the number of rows in the dataset is indeed 10 to 20 times the number of columns, so the number of instances is adequate. There doesnt seem to be any inconsistencys in the data. Pappas: 1st, 2nd, or 3rd class. Type: polynomial. Categorical, 3rd class: 491, 2nd class: 216, 1st class: 184 0 missing Name: Name of Sex: Male, female. Type: binomial. Male: 577, Female: 314 0 missing Age: from 0. 420 to 80. Average age: 29, standard deviation of 14+-, Max was 80. 177 missing Sibs (Siblings on board): Type: integer. Average less than 1, highest 8. This suggested an outlier, but on inspection the names where there were 8 siblings corresponded. (The name was sage, 3rd class passengers, all died. ) O missing Parch: number of parents, children onboard. Type: integer. Average: 0. 3, deviation 0. 8. Max was 6. O missing Ticket: ticket number. Type: polynomial. To me these ticket numbers seem quite random and my first inclination is to discard them. O missing Fare: Cost of ticket. Type: real. Average: 32, deviation +- 49. Maximum 512. There seems to be quite a disparity in the range of values here. Three tickets cost 512, outliers? O missing Cabin: cabin numbers. Type: polynomial. 687 missing From looking at this data I think I can discount one of my initial questions about cabin numbers. If there was more data it might be an interesting factor as regards cabin locations and survival. As it stands the quality of the data is not good, there are Just o many missing entries. I. E. Greater than 40%. So I will delete (filter out) the cabin attribute from the dataset. The age attribute could cause a problem with the amount of fields missing. There are too many to delete. I might use the average of all ages to fill in the blanks. Explore the data: From an initial exploration of the data, I was able to look at various plots and found some interesting results. I have tried to keep my findings to my initial questions that I wanted answered. Was there a majority of 3rd class passengers who died? You can tell from Figure 2 that this was true. This graph Just shows survival by class, 3rd class fairing the worst. Again this is shown with a scatter plot but with the added attribute sex. You can see on the female side of the first class passengers, only a few died. Interestingly it shows that it was mostly male 3rd class passengers who perished, and it is demonstrated that more males then females died. There is a clear division in classes demonstrated. This graph answers my other question. What was the ratio of passengers who died, male or female? From this we can see that mainly males did not survive. Although there were more males on board (577), about 460 perished. From the females (314), about 235 survived. Another attribute that needs attention is the age category. I wanted to find out if the women and children first policy was adhered to, but there are 177 missing age values. This is going to complicate my results on this. From leaving the 177 as they are, I get this graph: but this is not conclusive in Figure 5. I thought that the fare price might indicate a childrens price and therefore allow me to fill in an age, but the fare price doesnt seem to have much pattern. Another idea I thought might help would be to look at the names of passengers, I. . Miss might signify a lower age. (In 1912 the average age of marriage was 22, so anyone with title miss could have an age less than 22. ) Names which include master might indicate a young age as well. Figure 5 also indicates possible outliers on the right hand side. From this graph I could easily see the breakdown of the different class of passenger and where they embarked from. It is obvious that Southampton had the largest number of passengers get on board. Question had the highest proportion of 3rd class passengers compared to 2nd and 1st class at that port, and its also interesting o note that this was an Irish port. This graph further explores the port of embankment and shows the survival rate from each, as well as the different classes. To me it seems that the majority of 3rd class passengers were lost who came from Southampton port, although they did have the highest amount of 3rd class passengers. A closer look at Southampton port. The majority who didnt survive were 3rd class (blue), also noted is the handful of 1st class passengers (green) who died, yet Southampton had the highest number of 1st class passengers to board. See figure 6. Verify data quality There were a number of missing values in the dataset. The highest amount of missing data came from the cabin attribute. As it is higher than 45% (687 missing) I decided to filter out this column. There are also 177 missing values from the age attribute. This amount of missing data is again too large a percentage to ignore and needs to be filled in. I can see that the dataset contains less than 1000 rows, so I think that sampling will not have to be performed. There doesnt seem to be any inconsistencys in the data. There are still 2 missing pieces of information from the embankment attribute. I see that they are 1st class passengers so from my graph on embankment I think I can put her embankment from Churchgoer. The other passenger is a George Nelson, which I will add to Southampton. I decided to filter out names also. I dont see how it can help in the dataset. It may have helped with age, by looking at the title as I said, but for this I Just used the average age to replace the missing values. Another approach to filling in the missing age fields might be linear regression. Remove possible outliers? I can see that there may be some outliers. For instance in the fares attribute, there re three tickets which cost 512 when the average is 32. They were first class tickets, but the difference is huge. Data Preparation: Here is the result of using x validation on the dataset before any data preparation has taken place. I will now sort out the problem of 667 cabin numbers missing. With it being higher than 40%, Vive decided to delete the attribute entirely. Vive also deleted the name attribute, as I dont see how it will help. By deleting cabin, name and ticket, here is the result I get: I replaced the missing age fields with the average of ages, this increased the accuracy lightly and gave these results with x validation: I used detect outliers and picked the top ten and then filtered them out. This gave this result: The class recall for survived has not improved much. Increasing the number of neighbors in the detect outliers operator improved things, also limiting the filter to deleting 5 made a better accuracy. I decided to use specified binning for the ages and broke the ages into three bins. For children aged up to 13, middle aged from 13 to 45, and older from 45 to 80. I tried different age ranges and found that these ranges yielded the best results. It did increase the accuracy. I also used binning for the fares, splitting them into low, mid, and high which also improved results on the confusion matrix. I used detect outlier to find the ten most obvious outliers, and then used a filter to get rid of them. I have decided to remove cabin from the dataset, and also there are 177 missing age values which I have tried various approaches in changing. I changed the ages to the average age, but this gives a spike in the number of ages 29. 7. Example of average age problem: Modeling: I tried to implement both the decision tree and inn algorithms, seeing as the dataset as primarily categorical. I found that inn yielded the best results regarding accuracy. This was set at k=l . The accuracy was not great at 73%. The parameter of K is too small and may be influenced by noise. INN: 5 worked the best at 82. 38%. This seems to be the optimal value for k, and the distance is set right. Class precision is about even on each class. Decision tree: The decision tree algorithm didnt give me as much accuracy, and I found that turning off pre pruning gave me a better accuracy. From the decision tree, the age binning seemed to predict middle aged males (13 to 45) with a low fare well. The class recall for survived was not great at 67. 85%. Generate Test Design I used x-validation to perform cross validation on the data. I initially used 20 for the number of validations, but then found 25 achieved a better result. I used the apply model and performance operators as these are best used for classification tasks and work well with the polynomial attribute. This then presented me with a confusion matrix where I could measure the accuracy of my model by comparing the accuracy, recall and precision. I found that throughout my various testing of operators and valuating the confusion matrix, raising the class recall on true 1 (survived) most difficult. After all my efforts I managed to raise it to 73. 6%. I. E. 91 were incorrectly predicted as surviving. Figure Final result Workspace: From my initial objectives I was able to determine the answers using rapidness. I wanted to find out if those who perished were in the majority 3rd class passengers. I found this to be true, and also that the majority who died were male 3rd class passengers. Female passengers and children fared better than most which leads me t o believe that the rule of women and children first applied. This may have been sighted more to the first and second class passengers as demonstrated in Figure 3. Because the dataset had such a large amount of data missing concerning age, this was more difficult to determine. I found the embarked attribute to be interesting in the graphs I could generate from it. There seemed to be a large number of 3rd class passengers who died that had embarked from Southampton. If all the cabin numbers were present I wonder if Southampton 3rd class passengers had cabins close to where the iceberg hit? Did this have a bearing on their survival? From the different algorithms I used I found that Inn yielded the better results.

Saturday, March 7, 2020

Free Essays on Huckleberry Finn

In The Adventures of Huckleberry Finn, Mark Twain designates Huck as an outsider in order to supply him with an honest perspective on the early nineteenth century American society’s position on issues involving slavery. Twain initially reveals society’s stance on slavery through the outcast by presenting Huck’s misgivings about assisting Jim to freedom. Therefore, Huck’s convictions reveal that society instilled the notion that slaves were property and should not escape to freedom. Also, Huck comments that he would rather go to hell than turn Jim over to the authorities, furthermore revealing the idea promoted by civilization that helping a slave was a moral issue resulting in eternal damnation. Additionally, Huck has difficulty humbling himself and apologizing to Jim after the separation in the fog. This dramatic scene highlights the early 19th century doctrine that slaves were foremost property and subsequently human beings. Moreover, as Huck apolo gizes to Jim, he breaks every societal code or standard regarding the treatment of slaves by humbling himself before a nigger. Also, Huck is surprised by Tom’s willingness to aid him in the rescue and release of Jim. Huck’s reaction continues to display the societal beliefs pertaining to slavery, because Huck expects from Tom as from the rest of civilization to receive condemnation for his actions. However, it is later revealed that Tom committed to aid Huck based on the knowledge that Jim was already a free man by Miss Watson’s will, thus demonstrating that society would disapprove of Huck’s assistance to Jim. Twain uses Huckleberry Finn’s interactions with Jim on the Mississippi River to reveal society’s perspective on slavery.... Free Essays on Huckleberry Finn Free Essays on Huckleberry Finn Essay 1. Huck was raised in the south where blacks aren’t free and racism is taught, but despite all this Huck turned out to be a boy who doesn’t fully believe in racism, because his friendship with a black man, Jim let him look past race. 2. Quotes a. â€Å"It made me feel so mean I could almost kissed his foot to get him to take it back. It was fifteen minutes before I could work myself up to go and humble myself to a nigger – but I done it, and I warn’t ever sorry afterwards, neither. I didn’t do him no more mean tricks, and I wouldn’t done that one if Id known it would make him feel that way,† (pg. 95). b. Jim warn’t on his island†¦the raft was gone! My souls, but I was scared! I couldn’t get my breath for more than a minute. Then I raised a yell†¦Ã¢â‚¬â„¢Good lan’ is dat you, honey? Doan’ make no noise.’ It was Jim’s voice – nothing ever sounded so good before. I run along the bank a piece and got aboard, and Jim he grabbed me and hugged me,† (pg. 128). 3. Explanation a. â€Å"It made me feel so mean I could almost kissed his foot to get him to take it back. It was fifteen minutes before I could work myself up to go and humble myself to a nigger – but I done it, and I warn’t ever sorry afterwards, neither. I didn’t do him no more mean tricks, and I wouldn’t done that one if Id known it would make him feel that way,† (pg. 95). i. Huck thought that playing tricks on a black man was funny, but he didn’t take into account that blacks are human too, and they have feelings that can get hurt like anyone else. In this passage Huck feels horrible because he played a really bad trick on Jim; he made Jim think that he was lost and dead. When Jim realizes that it was all a joke he gets upset because Jim really cares for Huck, he loves him like a son. Because of the way that Huck was brought up, it takes a lot for him to get enough courage to apologize to a black person. After he does apologi... Free Essays on Huckleberry Finn In The Adventures of Huckleberry Finn, Mark Twain designates Huck as an outsider in order to supply him with an honest perspective on the early nineteenth century American society’s position on issues involving slavery. Twain initially reveals society’s stance on slavery through the outcast by presenting Huck’s misgivings about assisting Jim to freedom. Therefore, Huck’s convictions reveal that society instilled the notion that slaves were property and should not escape to freedom. Also, Huck comments that he would rather go to hell than turn Jim over to the authorities, furthermore revealing the idea promoted by civilization that helping a slave was a moral issue resulting in eternal damnation. Additionally, Huck has difficulty humbling himself and apologizing to Jim after the separation in the fog. This dramatic scene highlights the early 19th century doctrine that slaves were foremost property and subsequently human beings. Moreover, as Huck apolo gizes to Jim, he breaks every societal code or standard regarding the treatment of slaves by humbling himself before a nigger. Also, Huck is surprised by Tom’s willingness to aid him in the rescue and release of Jim. Huck’s reaction continues to display the societal beliefs pertaining to slavery, because Huck expects from Tom as from the rest of civilization to receive condemnation for his actions. However, it is later revealed that Tom committed to aid Huck based on the knowledge that Jim was already a free man by Miss Watson’s will, thus demonstrating that society would disapprove of Huck’s assistance to Jim. Twain uses Huckleberry Finn’s interactions with Jim on the Mississippi River to reveal society’s perspective on slavery.... Free Essays on Huckleberry Finn Superstitions in Huckleberry Finn In the novel The Adventures of Huckleberry Finn by Mark Twain, there is a lot of superstition. Some examples of superstition in the novel are Huck killing a spider which is bad luck, the hair-ball used to tell fortunes, and the rattle-snake skin Huck touches that brings Huck and Jim good and bad luck. Superstition plays an important role in the novel Huck Finn. In Chapter one Huck sees a spider crawling up his shoulder, so he flipped it off and it went into the flame of the candle. Before he could get it out, it was already shriveled up. Huck didn't need anyone to tell him that it was an bad sign and would give him bad luck. Huck got scared and shook his clothes off, and turned in his tracks three times. He then tied a lock of his hair with a thread to keep the witches away. "You do that when you've lost a horseshoe that you've found, instead of nailing it up over the door, but I hadn't ever heard anybody say it was any way to keep of bad luck when you'd killed a spider."(Twain 5). In chapter four Huck sees Pap's footprints in the snow. So Huck goes to Jim to ask him why Pap is here. Jim gets a hair-ball that is the size of a fist that he took from an ox's stomach. Jim asks the hair-ball; Why is Pap here? But the hair-ball won't answer. Jim says it needs money, so Huck gives Jim a counterfeit quarter. Jim puts the quarter under the hair-ball. The hair-ball talks to Jim and Jim tells Huck that it says. "Yo'ole father doan' know yit what he's a-gwyne to do. Sometimes he spec he'll go 'way, en den ag'in he spec he'll stay. De bes' way is tores' easy en let de ole man take his own way. Dey's two angles hoverin' roun' 'bout him. One uv'em is white en shiny, en t'other one is black. De white one gits him to go right a little while, den de black one sil in en gust it all up. A bo... Free Essays on Huckleberry Finn The Adventures of Huckleberry Finn: Homework Assignment Chapters I to IV 1. Even though Tom feels that Huck is not â€Å"respectable,† Huck is likeable, even admirable in many ways. We get certain impressions of Huck’s character in the opening chapters of the book. It seems Huck is used to caring for himself most of the time, but is now living with Widow Douglas. Widow Douglas and her sister, Miss Watson, try to â€Å"civilize† Huck, and educate him. Yet almost from the beginning of the book, Huck clearly doesn’t seem to have any interest in education, religion, or being civilized in general. Another important trait of Huck‘s that we learn about quickly is that he is superstitious. â€Å"’I got into my old rags and my sugar hogshead again, and was free and satisfied.’† and â€Å"’I didn’t need anybody to tell me that that was an awful bad sign and would fetch me some bad luck†¦Ã¢â‚¬â„¢Ã¢â‚¬  are examples of his rejection to â€Å"civilization† and his beliefs in supe rstitions. 2. Jim is Miss Watson’s slave. My first impressions of Jim are that he is completely uneducated, very easily fooled, and extremely superstitious. Also, when he can’t explain things that’s were his ability to embellish his stories come from, but they make for entertainment to the rest of the slaves that he tells his stories too. Examples: When Jim hears the noise he‘s determined to make sure he hears it again but he falls asleep in just a short 10minutes, that shows that he‘s easily fooled and not that bright. 3. Although we do not meet Huck’s father in these chapters, we do hear some things about him. Pap isn’t a reliable person, and hasn’t been around for over a year. Ben Rogers tells us that â€Å"’He used to lay drunk with the hogs in the tanyard†¦Ã¢â‚¬â„¢Ã¢â‚¬ , which obviously tells us he is a alcoholic with a poor reputation. 4. When Huck wants to smoke, the widow forbids him, saying it is a mean and unclean practice. ... Free Essays on Huckleberry Finn Summary The Adventures of Huckleberry Finn opens by familiarizing the reader with the events of the book that preceded it, Tom Sawyer. In the town of St. Petersburg, which lies along the Mississippi River, Huckleberry Finn, a poor boy with a drunken bum for a father, and his friend Tom Sawyer, a middle-class boy with an imagination a little too active for his own good, found a robber's stash of gold at the end of the earlier book. As a result of his adventure, Huck gains quite a bit of money (held in a sort of trust for him at the bank) and is adopted by the Widow Douglas, a kind but stifling woman who lives with her sister, the self-righteous Miss Watson. Huck is none too thrilled with his new life of cleanliness, manners, church, and school, but he sticks it out at the bequest of Tom, who tells him that in order to take part in his new "robbers' gang" Huck must stay "respectable." All is well and good until Huck's brutish father, Pap, reappears and demands Huck's money. Judge Thatcher and the Widow try to get legal custody of Huck, but the well-intentioned new judge in town believes in the rights of Huck's natural father and even takes the old drunk into his own home in an attempt to reform him. This effort fails miserably, and Pap soon returns to his old ways. He hangs around town for several months, harassing his son, who in the meantime has learned to read and to tolerate the Widow's attempts to improve him. Finally, outraged when the Widow Douglas warns him to stay away from her house, Pap kidnaps the boy, holding him in a cabin across the river from St. Petersburg. Whenever he goes out, Pap locks Huck in the cabin, and when he returns home drunk, he beats the boy. Tired of his confinement, and fearing the beatings will worsen, Huck escapes from Pap by faking his own death. Hiding on Jackson's Island out in the middle of the Mississippi River, he watches the townspeople search the river for his body. After a few days on t...