涓錛氫笅杞借蔣浠跺寘
涓嬭澆閾炬帴錛?/p>
1 http://mirrors.hust.edu.cn/apache/mahout/0.9/
浜岋細瑙e帇鏂囦歡
tar -zxvf mahout-distribution-0.9-src.tar.gz -C /usr/share/
tar -zxvf mahout-distribution-0.9.tar.gz -C /usr/share/
涓夛細緙栬瘧婧愮爜
1.cd /usr/share/mahout-distribution-0.9-src
2.鎵撹ˉ涓?涓嬭澆琛ヤ竵鏂囦歡錛岀劧鍚庝嬌鐢╬atch鍛戒護鎵撹ˉ涓?/p>
Wget https://issues.apache.org/jira/secure/attachment/12629768/1329.patch
patch -p0 < 1329.patch
3.緙栬瘧
1 mvn clean package -Dhadoop.profile=200 -Dhadoop.2.version=2.2.0 -Dhbase.version=0.68.0-hadoop2 -DskipTests
鍥涳細鏇挎崲jar鍖?/p>
鐢ㄥ垰緙栬瘧鐨刯ar鏂囦歡鏇挎崲(/usr/share/mahout-distribution-0.9)鐩綍涓嬬殑jar,鍏?涓?/p>
1 2 3 4 5 6 7 8 9 10 11 | mahout-core- 0.9 .jar mahout-core- 0.9 -job.jar mahou -examples- 0.9 .jar mahout-examples- 0.9 -job.jar mahout-integration- 0.9 .jar mahout-math- 0.9 .jar |
鎺ㄨ崘鍣ㄥ疄鐜扮被鍦細
org.apache.mahout.cf.taste.Hadoop.item.RecommenderJob銆傚叾杈撳叆鏁版嵁鏀懼湪榛樿杈撳叆鐩綍涓嬶紝浣跨敤mapred.input.dir鍙傛暟鎸囧畾鐨勮緭鍏ユ暟鎹紝鏄痷serID,itemID[,preferencevalue]鍊煎褰㈡垚鐨勬枃鏈枃浠躲傚彲浠ユ湁澶氫釜鏂囦歡瀛樻斁鍦ㄨ鐩綍涓嬨?/p>
榪愯鏃剁浉鍏沖弬鏁板涓嬶細
numRecommendations錛氫負姣忎釜鐢ㄦ埛浜х敓鐨勬帹鑽愪釜鏁?"Number of recommendations per user"
usersFile錛氬寘鍚緟鎺ㄨ崘鐢ㄦ埛鐨勭敤鎴稩D鍒楄〃錛?
itemsFile錛氬寘鍚緟鎺ㄨ崘欏圭洰鐨勯」鐩甀D鍒楄〃錛?
filterFile錛氱敤鏉ュ仛鎺ㄨ崘榪囨護鐨勮緇冩枃浠訛紝鍐呭涓轟嬌鐢ㄩ楀彿鍒嗛殧鐨剈serID,itemID瀵癸紝
booleanData錛氫笉甯︽帹鑽愬肩殑璁粌鏁版嵁鏂囦歡錛?
maxPrefsPerUser錛歁aximum number of preferences considered per user in final recommendation phase錛?
minPrefsPerUser錛歩gnore users with less preferences than this in the similarity computation 錛?nbsp; maxSimilaritiesPerItem錛歁aximum number of similarities considered per item錛?
maxurrencesPerItem錛歵ry to cap the number of urrences per item to this;
similarityClassname錛歂ame of distributed similarity class to instantiate, alternatively use one of the predefined similarities錛屽彲鐢ㄧ殑鐩鎬技搴︾被鏈夛細
SIMILARITY_URRENCE(DistributedurrenceVectorSimilarity.class),
SIMILARITY_EUCLIDEAN_DISTANCE(DistributedEuclideanDistanceVectorSimilarity.class),
SIMILARITY_LOGLIKELIHOOD(DistributedLoglikelihoodVectorSimilarity.class),
SIMILARITY_PEARSON_CORRELATION(DistributedPearsonCorrelationVectorSimilarity.class),
SIMILARITY_TANIMOTO_COEFFICIENT(DistributedTanimotoCoefficientVectorSimilarity.class),
SIMILARITY_UNCENTERED_COSINE(DistributedUncenteredCosineVectorSimilarity.class),
SIMILARITY_UNCENTERED_ZERO_ASSUMING_COSINE(DistributedUncenteredZeroAssumingCosineVectorSimilarity.class),
SIMILARITY_CITY_BLOCK(DistributedCityBlockVectorSimilarity.class);
RecommendJob榪愯涓緋誨垪MR浠誨姟錛屽湪寮鍙戞椂錛屽彲浠ユ牴鎹嚜宸辯殑闇瑕佽繘琛屾敼鍐欍備絾鏄疪ecommendJob鐢蟲槑鎴恌inal錛岃繖涓瘮杈冨ご鐤箋?/p>
1.itemIDIndex 浠誨姟錛?
map錛氳В鏋愯緭鍏ョ殑itemsFile錛涘皢闀挎暣鍨嬬殑ID閫氳繃綆楁硶鏄犲皠鍒版暣褰㈢殑搴忓彿涓婏紝浠ヤ究鍚庣畫澶勭悊銆傜敱浜庡鐞嗕腑娑夊強鍒扮煩闃佃綆楋紝姣忎竴涓」鐩搴旂煩闃典腑鐨勪竴涓淮搴︼紝鎵浠ュ繀欏誨鐞嗘垚鏁村艦錛涗駭鐢熷簭鍙?ID鍊煎錛?
reducer錛氬搴忓彿-ID瀵硅繘琛岄獙璇侊紝浜х敓搴忓彿-ID鍊煎錛?
2.toUserVector浠誨姟錛?
ToItemPrefsMapper錛氫粠filterFile涓鍙栧亸濂戒俊鎭紝杞垚鐢ㄦ埛-鍋忓ソ鍊煎銆?
ToUserVectorReducer錛氬皢鐢ㄦ埛-鍋忓ソ*錛岃漿鎴愮敤鎴?鍋忓ソ鐭㈤噺瀵癸紝鐭㈤噺琛ㄥ嵆涓烘墍鏈夌殑ItemID銆?/p>
3.countUsers 浠誨姟錛氳綆楃敤鎴鋒暟閲忥紝杈撳嚭涓虹敤鎴鋒暟閲?-絀恒?
4.maybePruneAndTransponse錛屼竴涓悕縐板緢濂囨殑浠誨姟銆?
MaybePruneRowsMapper:杈撳叆涓轟換鍔?鐨勮緭鍑猴紝鐢熸垚閽堝姣忎釜item欏圭洰鐨勬帹鑽愬肩煩闃靛崟鍏冿紝鍗矷tem搴忓彿鍜岀煩闃靛崟鍏冪殑鍊煎銆?
ToItemVectorsReducer錛氳緭鍑轟負鐭╅樀琛屽彿錛堝嵆Item搴忓彿錛?鐭╅樀琛岀煝閲?
5. RowSimilarityJob: 璁$畻鐩鎬技搴︾煩闃碉細榪欐槸寮曠敤涓涓幇鏈夌殑浠誨姟鏉ュ畬鎴愯綆楋紝杈撳叆涓轟換鍔?杈撳嚭鐨勭煩闃碉紱杈撳嚭涓虹浉浼煎害鐭╅樀錛屽嵆item-鐩鎬技搴︾煝閲忋傚叾涓浉浼煎害鐭㈤噺鏄綋鍓峣tem鍜屽叾浠杋tem鐨勭浉浼煎害鍊煎艦鎴愮殑鐭㈤噺銆?
6. prePartialMultiply1錛氳緭鍏ヤ負浠誨姟5鐨勮緭鍑猴紝灝嗙浉浼煎害鐭╅樀涓殑瀵硅綰胯錛屽嵆錛圢,N錛夋暟鍊艱緗負Double.NaN,涓哄悗緇綆楀仛鍑嗗錛?
7. prePartialMultiply2錛氳緭鍏ヤ負浠誨姟2鐨勮緭鍑猴紝灝唘ser-錛堥」鐩煝閲忥級錛屾媶鍒嗘垚item-(userId, 鎺ㄨ崘鍊?瀵廣傚鏋滆緗簡usersFile錛屽垯浠呭鐞唘sersFile涓寚瀹氱殑鐢ㄦ埛銆?
8. partialMultiply: 鍚堝茍浠誨姟6鍜?鐨勯杈撳嚭錛屽彉鎴恑tem-(鐩鎬技搴︾煝閲忋乽serId銆佹帹鑽愬? 瀵廣?
9. itemFiltering錛氬鏋滄湁filterFile,鍒欏鐞唂ilterFile鏂囦歡錛岃漿鎹㈡垚item-(鐩鎬技搴︾煝閲忋乽serId銆佹帹鑽愬?瀵廣傚叾涓浉浼煎害鐭㈤噺鐨勫間負0錛?
aggregateAndRecommend錛氬皢8鍜?鐨勮緭鍑哄悎騫朵綔涓鴻緭鍏ワ紝
PartialMultiplyMapper: 灝唅tem-(鐩鎬技搴︾煝閲忋乽serId銆佹帹鑽愬?闆嗚漿鎹㈡垚userId-(鎺ㄨ崘鍊鹼紝鐩鎬技搴︾煝閲?鍊煎錛?
AggregateAndRecommendReducer錛氭眹鎬籱ap杈撳嚭錛屼駭鐢焨serId-錛?itemId, 鎺ㄨ崘鍊?鍒楄〃錛夊煎錛屽叾涓?itemId, 鎺ㄨ崘鍊?鍒楄〃鏄寜鐓ф帹鑽愬害鏉ユ帓搴忥紝濡傛灉maxPrefsPerUser銆乵inPrefsPerUser銆乵axurrencesPerItem錛屽垯鍙駭 鐢熺鍚堟潯浠剁殑userId鍊煎銆?