新西兰机械学习与统计的相对客观分析


在新西兰



我当然写不出这么叼B的东西,但以下是相关的一些读后感和节选。。。
原文是以下地址:
http://brenocon.com/blog/2008/12/statistics-vs-machine-learning-fight/



## 一个图表比较



## 前文提及:machine learning 大部分建基于统计的probability theory...
##.以下是我认为比较贴切的一点,特别是最后几句。。。
## 在实际中,双方的目标是不同的。。
I’ll also note that there are definitely a number of topics in ML that aren’t very related to statistics or probability. Max-margin methods: if all we care about is prediction, why bother using a probability model at all? Why not just optimize the spatial geometry instead? SVM’s don’t require a lick of probability theory to understand. (Of course probability-based approaches are huge in ML, but it’s important to remember they’re not the only game in town, and there is no necessary reason they must be.) And then there are non-traditional settings such as online learning, reinforcement learning, and active learning, where the structure of access to information is in play. There are certainly plenty of things in statistics that aren’t considered part of ML — say, regression diagnostics and significance testing. Finally, many ML problems involve large, high dimensional data and models, where computational issues are very important. For example, in statistical machine translation, alignment models are described with probability theory and fit to data, but their structure is complex enough that optimal inference is intractable, and how you do approximate inference (EM, Viterbi, beam search, etc.) is a very major issue.

这一点也相当有趣:
think this is reflective of the differences in institutional culture between CS and Stats. There’s an interesting John Langford post on part of the issue, which he calls “The Stats Handicap”. He points out that stats Ph.D.’s have a big disadvantage in the job market because statistics has an old-school journal-oriented publishing culture, so students publish much less and have less experience engaging with a research community. CS is conference-oriented — certain conferences have a higher prestige than many journals (e.g. NIPS in ML, CHI in HCI) — and this results in faster turnaround, dissemination, and collaboration. (I’ve heard others make similar comparisons between CS and psychology.) I’d expect any discipline with a larger conference emphasis to have better courses since they should reward presentation/teaching skills — or at least encourage practice — more than in journal world.

## 用machine learning的算法(当然这些很多的算法是基于统计理论的完善的)做data mining
## 以下是一些统计与data mining的看法
Another issue is the definition of statistics itself. In 1997, Jerome Friedman wrote an extremely interesting analysis of the situation: “Data Mining and Statistics: What’s the Connection?”. He points out, quite correctly, the statistical impoverishment of some common approaches to data mining. You can certainly blame statistics for not marketing its ideas well enough, or blame CS for ignoring statistics.

## 以下是一些看法:统计人都被打成这样了,怎么可以阿Q精神一下。
That is not to say statistics is not important — it’s incredibly important. He quotes Efro(boostraping(统计) 的主要贡献人)n as saying “Statistics has been the most successful information science.” However, information science is becoming bigger and broader and more exciting, thanks to computation and ever-increasing amounts of data. What should statisticians do? Friedman continues (light editing and emphasis is mine):


One view says that our field should concentrate on that small part of information science that we do best, namely probabilistic inference based on mathematics. If this view is adopted, we should become resigned to the fact that the role of Statistics as a player in the “information revolution” will steadily diminish over time.

Another point of view holds that statistics ought to be concerned with data analysis. The field should be defined in terms of a set of problems — rather than a set of tools — that pertain to data. Should this point of view ever become the dominant one, a big change would be required in our practice and academic programs.
First and foremost, we would have to make peace with computing. It’s here to stay; that’s where the data is. This has been one of the most glaring omissions in the set of tools that have so far defined Statistics. Had we incorporated computing methodology from its inception as a fundamental statistical tool (as opposed to simply a convenient way to apply our existing tools) many of the other data related fields would not have needed to exist. They would have been part of our field.

Friedman wrote this article more than 10 years ago. All his observations about the importance and increasing prevalence of data and computing power are even more true today than back then. Has the field of statistics changed? Not clear. (I’d appreciate seeing evidence to the contrary.)


## 总结,真心话,其实奥大经济系的计量经济亦有“类统计分析”的效果。。
## 类统计分析指,你会学到为什么会这样在统计系了,但其它系都在用,而且给你相关数据告诉你怎么用。。。
## 奥大的统计往往会令不少人失望,他们会期望教得像澳洲精算那样都是概率模型,或者,教得像中国那样大部分都是数学。
## 没有!奥大的统计现在主要贡献生物,医疗等自然科学。想学偏社会科学的统计,还是早登极乐,脱离苦海,选择经济,社会,心理学(奥大心理学其实更偏向于脑/认知科学。) 吧
I know that I’m interested in quantitative information science, including statistics and data analysis. Machine learning has many strengths, but it is definitely an odd way to go about analysis. But there’s a good case that statistics, as traditionally defined, is only going to have a smaller role in the future. “Data mining” sounds more relevant, but does it even exist as a coherent subject? Maybe it’s time to study a more applied statistical field like econometrics.


评论
以下是一些非电脑,非统计的学生的讨论,他们会应用到统计以及电脑,这比单方面一个统计系学生说统计好,CS学生说CS好,黄婆卖瓜的逻辑来得好.

chemometric : 化学计量学
I come from yet another closely related field: chemometrics which is usually defined as applying statistics to chemical problems/data. Never heard machine learning in the place of statistics here. But chemometrics is heavily focused on prediction (also DoE, but far less about hypothesis testing)

I don't think it is fair to exclude prediction from statistics.  

I rather see a difference in the approach (Ahmed's culture): My guess would be that machine learning is maybe more pragmatic than "pure statistics": if machine learning has an algorithm that solves a problem that's good. Statisticians tend to want thorough theoretical foundations as well. Chemometrics would also be more on the pragmatic side.
(Source: personal experience with chemometrics, where e.g. partial least squares regression has an extremely successful track of records for some 30 years now, including industrial application. Statistics now start to take the approach seriously because finally some statisticians bothered to have a look at the mathematical properties - before it was just an algorithm that happened to work very well with the chemometric data sets).

评论

.......................................

新西兰移民留学

关于rv转pr

新西兰关于rv转pr, 请问副申请需要和主申请一样要求满足居住要求等等一系列吗?这官网上也没说清楚啊。以前听人说副申请只需来新西兰一趟激活就行。请问有类似情况的朋友且已经申请过 ...

新西兰移民留学

初中短期暑假课

新西兰有学校可以接受从国内暑假7,8月来新西兰上一个月的课吗?短期课,谢谢 评论 我们语言学校有对接的中小学插班微留学,感兴趣请加linxiao1914 评论 北岸语言学校有推8月为期3周微留学 ...

新西兰移民留学

Unitec 留学生出席率

新西兰那个unitec 说留学生的出席率要有100%,有人有经验吗?我以前在UoA是没有人问出席率的,unitec出席率不足真的会影响下次签证申请吗?还是他们只是吓吓你而已?求帮忙! 有人有unitec 的 ...

新西兰移民留学

五年多次旅游签

新西兰各位大侠 有个问题想咨询下 老人现在是5年多次 旅游签,每次6个月,目前在境内。 现在有两个说法,1. 每12个月最多6个月。如果想要多待就再提交申请旅游签,可以续3个月,同时之前 ...

新西兰移民留学

新西兰签证费用大幅上涨

新西兰申请新西兰签证费用将大幅度上涨 新西兰政府今天宣布,将于 2024 年10 月 1 日起提高签证费用(包括费用和征费)。 自10月1日起,新的签证收费将设定在适当水平,以收回办理签证的 ...

新西兰移民留学

有几件Mt Albert Grammar 的校服 出售

新西兰Pick up in Sandringham, 也可以送给您!都挺新的,穿了还不到一年。需要更多照片请加微信DiDiDa1613, 有的照片传不上来。也可以联系电话0221082596. 衬衫 12号,$35 裙子 10号,$60 毛衣 Large, $70 ...

新西兰移民留学

关于新西兰入籍

新西兰请问 新西兰入籍 如果一共在新西兰已住满1350天,且这5年中每年至少待满了240天。 那么,从第5年的第一天算起,当只要一旦待够240天,也就是说第5年第8个月后,是否就可以提交入籍 ...

新西兰移民留学

插班一个月

新西兰你们这边有提供国内初中生暑假来新西兰上插班一个月的学校吗?谢谢 评论 插班一个月?体验生活? 评论 看来又一个想做游学的了~ 评论 我们什么学校都有,只要你有钱 评论 小学还 ...

新西兰移民留学

大学短期课程

新西兰国内客人想来这边体验一下大学课程,短期的1-2 周。或者零散的几天课程都可以。有哪位大神有资源可以合作吗? 0220506606. 评论 大学短期几天参加下课程真没有 要不读几周的语言?也 ...

新西兰移民留学

给留学生提供免费住宿

新西兰可以提供奥克兰中区舒适的免费住宿,交换前提是每天清理卫生一小时和辅导两个孩子学习一小时。 理科生女孩最好,爱干净,喜欢宠物 联系微信13810811159 评论 按新西兰最低工资标准 ...

新西兰移民留学

2024年2月份开学的看过来

新西兰我本人是这个时间开学的,女生,想建个群咱们可以一起讨论租房、学习以及平时有时间一起出去游玩,感兴趣的伙伴们看过来,我微信fq135791113,加时麻烦备注下自己开学时间和专业哦 ...

新西兰移民留学

求职必看(少***)倒霉路店

新西兰Hi!我是一名中国留学生!我在中文论坛网上看到了少***火锅店的招聘信息,在两三小时的无薪使用后我被manager录取了。应聘要求是在试用期给我远远低于新西兰法定最低工资现金:1 ...