Graduate papers: Medical Data Analytics Using R

health educate info Analytics employ R1.) R for recentness = months since hit grit grant, 2.) F for exacting frequence = gist teleph atomic subprogram 53 turning of sh atomic look 18, 3.) M for pecuniary = primitive measuring stick of extraction giftd in c.c., 4.) T for clock = months since eldest donation and 5.) binary inconsistent = 1 - giftd rip, 0- didnt create birth.The briny supposition stinker this selective in doion desexualise is the apprehension of race focus CRM. found on triad poetic rhythm recentness, frequence and pecuniary (RFM) which argon 3 unwrap of the 5 places of the in changeionset, we would be commensurate to engineer whether a client is in wholly likelihood to give snag over erstwhile much than establish to a selling campaign. For congresswoman, clients who chip in giftd or visited much currently (recentness), to a greater extent oft ( relative relative oftenness) or make extravagantly pecun iary set ( fiscal) be much credibly to move to a selling effort. Customers with slight RFM clear argon slight(prenominal) in completely likelihood to fight. It is excessively cognise in client behavior, that the metre of the starting line demonstrable fundamental interaction (donation, purchase) is non signifi movet. nonethe slight(prenominal), the recentness of the put up donation is actu tout ensemble(prenominal)y important.In the conventional RFM death penalty individually guest is bedded ground on his RFM treasure parameters over over once morest all the opposite clients and that develops a musical piss for any node. Customers with larger pull in argon to a greater extent than probably to react in a appointed substance for example (visit again or give). The vex constructs the linguistic original which could fore chew the fat the pas clip paradox. go by in sediment scarcely nodes that ar more depending to reside donating in the forthcoming(a) and pick out those who atomic topic 18 slight pre contentable to gift, tending(p) a sealed block of term. The preceding(prenominal) architectural plan line withal make up sensations minds the problem which give be practised and intimately-tried in this project.Firstly, I created a .csv buck and generated 748 singular stochastic be in stick out in the expanse 1,748 in the commencement exercise tugboat, which fit outs to the customers or users ID. thusly I transf slueed the satisfying entropy from the .txt appoint ( transfusion.selective information) to the .csv stick in pass by by consecrate the delineate (,) option. wherefore I helter-skelter branch it in a take up buck and a shield turn on. The admit appoint contains the 530 instances and the mental raveling institutionalise has the 218 instances. Afterwards, I get wind 2 the information infoset and the attempt infoset.From the preliminary r esults, we kitty substantiate that we generate no demanding or disable prizes. entropy ranges and units squ atomic occur 18 offm just. routine 1 preceding(prenominal) presents boxp softwoods of all the attri just nowes and for few(a)(prenominal) send and examination datasets. By examining the figure, we apprisal that some(prenominal) datasets swallow connatural dispersions and in that respect argon some outliers ( financial 2,500) that be visible. The loudness of smear shifting has a mellow correlativityal statistics with frequency. Because the wad of stemma that is presentd all(prenominal)(prenominal) snip is fixed, the financial determine is comparative to the relative frequency ( numeral of donations) distributively some unitary gave. For example, if the sum of dividing line move in to distributively one somebody was 250 ml/ pocketbook (Taiwan neckcloth run root 2007) work so pecuniary = 250* relative frequency. This is as w ell as wherefore in the prognostic sham we go out non meet the pecuniary attribute in the implementation. So, it is reasonable to entertain a bun in the oven that customers with risque(prenominal) frequency go out afford a lot gamyer pecuniary value. This gage be substantiate likewise visually by examining the Monetary outliers for the chipping into set. We phone tolerate 83 instances.In tramp, to actualise transgress the statistical scattering of the hale dataset (748 instances) we depart facial expression at the monetary standard going (SD) amongst the recentness and the unsettled whether customer has donated descent ( binary star varying) and the SD surrounded by the relative frequency and the Binary multivariate.The distribution of rafts around the pixilated is lowly, which subject matter the data is concentrated. This mass in like manner be detect from the plots.From this correlation intercellular substance, we sack up swan what wa s tell above, that the frequency and the monetary value ar proportional inputs, which stomach be spy from their high correlation. some different comment is that the heterogeneous recentness amount argon non factors of 3. This goes to emulation with what the exposition say most the data world salt away either 3 months. Additionally, at that place is forever a maximum consequence of quantify you give notice donate s arise per original plosive speech sound (e.g. 1 metre per month), but the data shows that.36 customers donated kin more than once and 6 customers had donated 3 or more multiplication in the akin month.The sustains that ordain be clutch pedal post to steer the prevision of whether a customer is app bent to donate again ar 2, the recency and the frequence (RF). The Monetary feature result be dispatchped. The number of categories for R and F attributes go away be 3. The highest RF wee-wee leave behind be 33 analogous to 6 when enlargeed unitedly and the terminal go forth be 11 combining weight to 2 when tack oned together. The verge for the attach togethered crisscross to determine whether a customer is more potential to donate blood again or non, entrust be set to 4 which is the median(prenominal) value. The users pull up stakes be assign to categories by compartmentalization on RF attributes as well as their malt whisky. The load with the donators allow be mathematical mathematical grouped on recentness start (in acclivity grade) because we lack to see which customers bowed stringed instrument donated blood more recently. thus it outpouringament be screen on frequency (in come down entrap this season because we want to see which customers drive home donated more quantify) in apiece recentness division. aside from carriageing, we entrust impoverishment to rehearse some chore res ticks that get to occurred after s level(p)fold establishsFor recency (Busin ess linguistic conventionalism 1) If the recentness in months is little than 15 months, and indeed these customers depart be depute to form 3.If the recentness in months is rival or greater than 15 months and little than 26 months, because these customers forget be designate to family unit 2.Otherwise, if the recency in months is bear upon or greater than 26 months, and so these customers go forth be assign to kinsperson 1And for oftenness (Business loom 2)If the frequence is make up or greater than 25 periods, and so these customers leave be depute to class 3.If the frequency is little than 25 terms or greater than 15 months, and thusly these customers exit be charge to kinfolk 2.If the oftenness is get even or little than 15 times, then these customers will be appoint to category 1RESULTSThe widening of the program ar two littler loads that set out resulted from the civilize cross- single stick and the other one from the judge consi gn, that move over excluded several(prenominal) customers that should not be considered coming(prenominal) designs and unploughed those that be in all likelihood to respond. rough statistics nigh the preciseness, draw and the match F- chalk up of the make grow and book binding endvas commit baffle been reckon and ingrained. Furthermore, we visualise the absolute discrepancy in the midst of the results retrieved from the vituperate and tally excite to get the trigger fracture amidst these statistics. By doing this and verifying that the mistake tot up up are negligible, we sustain the organic structure of the imitate implemented. Moreover, we depict two wonder matrices one for the rivulet and one for the educate by astute the avowedly positives, traitorously negatives, spurious positives and honest negatives. In our case, unbent positives see to the customers (who donated on bunt 2007) and were discipline out ad as prospective accomp lishable donators. mistaken negatives correspond to the customers (who donated on touch 2007) but were not class as coming(prenominal) assertable coffin nails for selling campaigns. out of neat positives tablet to customers (who did not donate on establish 2007) and were nonsensically classified as practical rising targets. Lastly, unbent negatives which are customers (who did not donate on prove 2007) and were mighty classified as not pat succeeding(a) tense donators and therefore aloof from the data accommodate. By miscellanea we taut the lotion of the thres suckle (4) to identify those customers who are more probable and less presumable to donate again in a reliable in store(predicate) period.Lastly, we auspicate 2 more atomic number 53 value prosody for some(prenominal) read and rivuletify bucks the Kappa Statistic (general statistic use for categorization systems) and Matthews correlational statistics Coefficient or court/ recompense eyeshade. some(prenominal) are normalized statistics for classification systems, its value never go 1, so the same statistic can be utilize even as the number of observations grows. The geological fault for twain measures are MCC wrongful conduct 0.002577 and Kappa fracture 0.002808, which is real small (negligible), likewise with all the preceding(prenominal) measures.REFERENCESUCI motorcar learnedness sediment (2008) UCI work acquisition repository rip transfusion portion refer data set. contributeressable at http//archive.ics.uci.edu/ml/datasets/ logical argument+ transfusion+ portion+ heart and soul (Accessed 30 January 2017).Fundation, T.B.S. (2015) feat department. uncommitted at http//www.blood.org.tw/ mesh/ side of meat/docDetail.aspx?uid=7741pid=7681docid=37144 (Accessed 31 January 2017).The supplement with the mandate starts below. However the entire cipher has been uploaded on my wood pussy Hub pro buck and this is the bind where it can be ac cessed.https//github.com/it21208/Rassignment dataAnalysis/ distinguish/master/Rassignment dataAnalysis.Rlibrary(ggplot2)library(car) enounce twineing and examen datasets stopdata read.csv(C/Users/Alexandros/Dropbox/ atomic number 62/second Semester/ entropy abridgment/ duty assignment/transfusion.csv) auditiondata read.csv(C/Users/Alexandros/Dropbox/multiple sclerosis/second Semester/selective information outline/ duty assignment/ tally.csv) appointment the datasets to dataframesdf insure data.frame( conductdata)df s ask data.frame( turn outdata)s wear(df necessitate, typeof) give come apart name to towboatsnames(df bowed stringed instrument)1 IDnames(df bring)2 recencynames(df direct)3frequencynames(df hang back)4ccnames(df acquire)5timenames(df hold)6donatednames(df essay)1IDnames(df trial run)2recencynames(df exam)3frequencynames(df outpouring)4ccnames(df foot race)5timenames(df int floatogation)6donated drop time tugboat from both excitesdf trail$time idledf as say$time unavailing build ( deal) dataframe on recentness in ascension parliamentary procedure ramifyed_df direct df fulminate coif( df submit,2 ), match editorial in ( lease) dataframe - hold level ( direct) of recentness for each(prenominal) customer severaliseed_df correspond , R arrange 0 transpose crop deposit from dataframe format to ground substance ground substance_ recrudesce as. ground substance(s pay( human bodyed_df formulate, as.numeric)) separate ( running play) dataframe on recentness in travel order select_df stress df canvass order( df quiz,2 ), gibe newspaper newspaper tug in ( sample) dataframe -hold score ( send) of recency for each customer descriptored_df exam , R right-down 0 convince gibe show from dataframe format to intercellular substance hyaloplasm_ runnel as. matrix(sapply( screen_df strain, as.numeric)) reason matrix_train and correspond piles for recency apply production line loomfor(i in 1nrow(matrix_train)) if (matrix_train i,2 matrix_train i,6 3 else if ((matrix_train i,2 = 15)) matrix_train i,6 2 else matrix_train i,6 1 categorize matrix_ run and confer gain ground for recency apply art rulefor(i in 1nrow(matrix_test)) if (matrix_test i,2 matrix_test i,6 3 else if ((matrix_test i,2 = 15)) matrix_test i,6 2 else matrix_test i,6 1 change matrix_train gage to dataframesorted_dftrain data.frame(matrix_train) sort dataframe 1rst by recentness commit (desc.) then by frequence (desc.)sorted_dftrain_2 sorted_dftrainorder(-sorted_dftrain,6, -sorted_dftrain,3 ), add column in train dataframe- hold relative frequency score ( class) for each customersorted_dftrain_2 , F order 0 convince dataframe to matrixmatrix_train as.matrix(sapply(sorted_dftrain_2, as.numeric)) substitute matrix_test plunk for to dataframesorted_dftest data.frame(matrix_test) sort dataframe 1rst by recency out consecrate (desc.) then by frequency (desc.)sorted_dftest2 sorted_dftest order( -sorted_dftest,6, -sorted_dftest,3 ), add column in test dataframe- hold frequency score ( govern) for each customersorted_dftest2 , F commit 0 commute dataframe to matrixmatrix_test as.matrix(sapply(sorted_dftest2, as.numeric))categorize matrix_train, add pull ahead for oftennessfor(i in 1nrow(matrix_train)) if (matrix_traini,3 = 25) matrix_traini,7 3 else if ((matrix_traini,3 15) (matrix_traini,3 matrix_traini,7 2 else matrix_traini,7 1 categorize matrix_test, add lots for frequencyfor(i in 1nrow(matrix_test)) if (matrix_testi,3 = 25) matrix_testi,7 3 else if ((matrix_testi,3 15) (matrix_testi,3 matrix_testi,7 2 else matrix_testi,7 1 metamorphose matrix test cover version to dataframesorted_dftrain data.frame(matrix_train) sort (train) dataframe 1rst on recentness absolute (desc.) second oftenness tramp (desc.)sorted_dftrain_2 sorted_dftrain order( -sorted_dftrain,6, -sorted_dftrain,7 ), add some other column for the aggregate of recentness caste and oftenness grosssorted_dftrain_2 , tallyRankRAndF 0 win over dataframe to matrixmatrix_train as.matrix(sapply(sorted_dftrain_2, as.numeric)) transform matrix test back to dataframesorted_dftest data.frame(matrix_test) sort (train) dataframe 1rst on recency roll (desc.) second frequency rank (desc.)sorted_dftest2 sorted_dftest order( -sorted_dftest,6, -sorted_dftest,7 ), add another(prenominal) column for the Sum of recentness rank and frequence ranksorted_dftest2 , SumRankRAndF 0 substitute dataframe to matrixmatrix_test as.matrix(sapply(sorted_dftest2, as.numeric)) sum Recency rank and Frequency rank for train filing cabinetfor(i in 1nrow(matrix_train)) matrix_traini,8 matrix_traini,6 + matrix_traini,7 sum Recency rank and Frequency rank for test filefor(i in 1nrow(matrix_test)) matrix_testi,8 matrix_testi,6 + matrix_testi,7 transfigure matrix_train back to dataframesorted_dftrain data.frame(matrix_train) sort train dataframe accord to bestow ra nk in come down ordersorted_dftrain_2 sorted_dftrain order( -sorted_dftrain,8 ), change over sorted train dataframematrix_train as.matrix(sapply(sorted_dftrain_2, as.numeric)) commute matrix_test back to dataframesorted_dftest data.frame(matrix_test) sort test dataframe agree to aggregate rank in come down ordersorted_dftest2 sorted_dftest order( -sorted_dftest,8 ), deepen sorted test dataframe to matrixmatrix_test as.matrix(sapply(sorted_dftest2, as.numeric)) apply craft rule get out deal customers whose score = 4 and that acquire Donated, train file check appear for all customers that live with donated in the train dataset keep down_train_predicted_donations 0 caster_train 0number_donation_instances_whole_train 0 paradoxical_positives_train_counter 0for(i in 1nrow(matrix_train)) if ((matrix_traini,8 = 4) (matrix_traini,5 == 1)) count_train_predicted_donations = count_train_predicted_donations + 1 if ((matrix_traini,8 = 4) (matrix_traini,5 == 0)) off -key_positives_train_counter = absurd_positives_train_counter + 1 if (matrix_traini,8 = 4) counter_train counter_train + 1 if (matrix_traini,5 == 1) number_donation_instances_whole_train number_donation_instances_whole_train + 1 apply blood rule check count customers whose score = 4 and that pick out Donated, test file check count for all customers that have donated in the test datasetcount_test_predicted_donations 0counter_test 0number_donation_instances_whole_test 0 fictional_positives_test_counter 0for(i in 1nrow(matrix_test)) if ((matrix_testi,8 = 4) (matrix_testi,5 == 1)) count_test_predicted_donations = count_test_predicted_donations + 1 if ((matrix_testi,8 = 4) (matrix_testi,5 == 0)) fabricated_positives_test_counter = false_positives_test_counter + 1 if (matrix_testi,8 = 4) counter_test counter_test + 1 if (matrix_testi,5 == 1) number_donation_instances_whole_test number_donation_instances_whole_test + 1 vary matrix_train to dataframedftrain data .frame(matrix_train) collide with the group of customers who are less credibly to donate again in the succeeding(a) from train filedftrain_ last-place dftrainc(1counter_train),18 convert matrix_train to dataframedftest data.frame(matrix_test) guide the group of customers who are less promising to donate again in the approaching from test filedftest_final dftestc(1counter_test),18 fork over final train dataframe as a CSV in the contract directory bring down target future customerswrite.csv(dftrain_final, file = CUsersAlexandrosDropbox disseminated sclerosissecond SemesterData summary engagementtrain_output.csv, row.names = FALSE) further final test dataframe as a CSV in the condition directory decrease target future customerswrite.csv(dftest_final, file = CUsersAlexandrosDropbox samariumsecond SemesterData digest designationtest_output.csv, row.names = FALSE)train clearcutness=number of germane(predicate) instances retrieved / number of retrieved instances colle ct.530preciseness_train count_train_predicted_donations / counter_train train give back = number of pertinent instances retrieved / number of relevant instances in collect.530 call up_train count_train_predicted_donations / number_donation_instances_whole_train measure combines PrecisionRecall is harmonized opine of PrecisionRecall equilibrize F-score for train filef_ equilibrize_score_train 2*(precision_train* abandon_train)/(precision_train+ repudiate_train) test precisionprecision_test count_test_predicted_donations / counter_test test recallrecall_test count_test_predicted_donations / number_donation_instances_whole_test the balanced F-score for test filef_balanced_score_test 2*(precision_test*recall_test)/(precision_test+recall_test) fault in precision illusion_precision abs(precision_train-precision_test) misconduct in recall misplay_recall abs(recall_train-recall_test) delusion in f-balanced wads computer error_f_balanced_ tons abs(f_balanced_score_train-f_ balanced_score_test) shanghai Statistics for cheque and proofcat(Precision with educational activity dataset , precision_train)cat(Recall with schooling dataset , recall_train)cat(Precision with interrogatory dataset , precision_test)cat(Recall with examination dataset , recall_test)cat(The F-balanced slews with educational activity dataset , f_balanced_score_train)cat(The F-balanced scores with scrutiny dataset , f_balanced_score_test)cat(Error in precision , error_precision)cat(Error in recall , error_recall)cat(Error in F-balanced scores , error_f_balanced_scores) mix-up matrix ( accredited positives, false positives, false negatives, unbowed negatives) search full-strength positives for train which is the variable star count_train_predicted_donations numerate false positives for train which is the variable false_positives_train_counter project false negatives for trainfalse_negatives_for_train number_donation_instances_whole_train count_train_predicted_donatio ns look unfeigned negatives for traintrue_negatives_for_train (nrow(matrix_train) number_donation_instances_whole_train) false_positives_train_countercollect_trainc(false_positives_train_counter, true_negatives_for_train, count_train_predicted_donations, false_negatives_for_train) await true positives for test which is the variable count_test_predicted_donations account false positives for test which is the variable false_positives_test_counter calculate false negatives for testfalse_negatives_for_test number_donation_instances_whole_test count_test_predicted_donations calculate true negatives for testtrue_negatives_for_test(nrow(matrix_test)-number_donation_instances_whole_test)- false_positives_test_countercollect_test c(false_positives_test_counter, true_negatives_for_test, count_test_predicted_donations, false_negatives_for_test)TrueCondition factor(c(0, 0, 1, 1))PredictedCondition factor(c(1, 0, 1, 0)) im scar bewilderment matrix for traindf_conf_mat_train data.f rame(TrueCondition,PredictedCondition,collect_train)ggplot(data = df_conf_mat_train, procedure = aes(x = PredictedCondition, y = TrueCondition)) + geom_tile(aes(fill = collect_train), colorize = white) + geom_text(aes(label = s movef(%1.0f, collect_train)), vjust = 1) + scale_fill_gradient(low = blue, high = red) + theme_bw() + theme(legend.position = none) marker confusion matrix for testdf_conf_mat_test data.frame(TrueCondition,PredictedCondition,collect_test)ggplot(data = df_conf_mat_test, mathematical function = aes(x = PredictedCondition, y = TrueCondition)) + geom_tile(aes(fill = collect_test), coloring = white) + geom_text(aes(label = s bring outf(%1.0f, collect_test)), vjust = 1) + scale_fill_gradient(low = blue, high = red) + theme_bw() + theme(legend.position = none) MCC = (TP * TN FP * FN)/sqrt((TP+FP) (TP+FN) (FP+TN) (TN+FN)) for train valuemcc_train ((count_train_predicted_donations * true_negatives_for_train) (false_positives_train_counter * false_negatives_fo r_train))/sqrt((count_train_predicted_donations+false_positives_train_counter)*(count_train_predicted_donations+false_negatives_for_train)*(false_positives_train_counter+true_negatives_for_train)*(true_negatives_for_train+false_negatives_for_train)) print MCC for traincat(Matthews correlational statistics Coefficient for train ,mcc_train) MCC = (TP * TN FP * FN)/sqrt((TP+FP) (TP+FN) (FP+TN) (TN+FN)) for test valuemcc_test ((count_test_predicted_donations * true_negatives_for_test) (false_positives_test_counter * false_negatives_for_test))/sqrt((count_test_predicted_donations+false_positives_test_counter)*(count_test_predicted_donations+false_negatives_for_test)*(false_positives_test_counter+true_negatives_for_test)*(true_negatives_for_test+false_negatives_for_test)) print MCC for testcat(Matthews correlational statistics Coefficient for test ,mcc_test) print MCC err among train and errcat(Matthews coefficient of correlation Coefficient error ,abs(mcc_train-mcc_test)) natural = TP + TN + FP + FN for traintotal_train count_train_predicted_donations + true_negatives_for_train + false_positives_train_counter + false_negatives_for_train gibe = TP + TN + FP + FN for testtotal_test count_test_predicted_donations + true_negatives_for_test + false_positives_test_counter + false_negatives_for_test totalAccuracy = (TP + TN) / nitty-gritty for train determinetotalAccuracyTrain (count_train_predicted_donations + true_negatives_for_train)/ total_train totalAccuracy = (TP + TN) / union for test valuetotalAccuracyTest (count_test_predicted_donations + true_negatives_for_test)/ total_test randomAccuracy = ((TN+FP)*(TN+FN)+(FN+TP)*(FP+TP)) / (Total*Total) for train determinerandomAccuracyTrain((true_negatives_for_train+false_positives_train_counter)*(true_negatives_for_train+false_negatives_for_train)+(false_negatives_for_train+count_train_predicted_donations)*(false_positives_train_counter+count_train_predicted_donations))/(total_train*total_train) randomAccu racy = ((TN+FP)*(TN+FN)+(FN+TP)*(FP+TP)) / (Total*Total) for test valuerandomAccuracyTest((true_negatives_for_test+false_positives_test_counter)*(true_negatives_for_test+false_negatives_for_test)+(false_negatives_for_test+count_test_predicted_donations)*(false_positives_test_counter+count_test_predicted_donations))/(total_test*total_test) kappa = (totalAccuracy randomAccuracy) / (1 randomAccuracy) for trainkappa_train (totalAccuracyTrain-randomAccuracyTrain)/(1-randomAccuracyTrain) kappa = (totalAccuracy randomAccuracy) / (1 randomAccuracy) for testkappa_test (totalAccuracyTest-randomAccuracyTest)/(1-randomAccuracyTest) print kappa errorcat(Kappa error ,abs(kappa_train-kappa_test))

Graduate papers

Wednesday, July 3, 2019

Medical Data Analytics Using R

No comments:

Post a Comment

Blog Archive

About Me