作者:J.贝列尼凯·赫尔曼 亚瑟·M.雅各布斯 安德鲁·派珀 汪 蘅(译);转自:DH 数字人文

J.贝列尼凯·赫尔曼/瑞士巴塞尔大学数字人文实验室

亚瑟·M.雅各布斯/德国柏林自由大学达勒姆情感神经镜像研究所

安德鲁·派珀/加拿大麦吉尔大学语言、文学和文化系

汪 蘅(译)/自由译者

————————————————————-

摘 要:计量文体学(CS)领域以计算和统计方法研究文学文本的形式、社会嵌入和美学潜能。CS用更透明的方法操作更大量的数据集,为文学研究提供新的观察尺度和阐释方法,测试现存理论并形成新理论。和许多数据驱动的领域一样,这些方法涵盖探索性、解释性和预测模型,对每种建模的功能可见性(affordance)和局限性有重要讨论。CS从作者归属、文体学、自然语言处理等多种传统中逐渐演化,着手于更富野心的理论问题,包括:风格、文体类型(genre)和历史时期(epoch);传统主题、情节、角色网络;叙事角度、角色塑造与情感;性别、种族和社会地位;正典性、文学性和文本特质;对字词的美(word beauty)、隐喻和押韵的认知表现。在数据学科中,CS构成了独特的知识领域,其中数字(方法、媒介)和统计内容的功能可见性与认知内容相互作用,在“文本”“语境”“作者”和“读者”等分析层面产生新知。

关键词:计量文体学 数据科学 文学研究 语料库

————————————————————-

引 言

计量文体学(CS)[1]是新兴领域,力求通过计算和统计方法为“文学话语”的多维度现象建模。CS应用数据驱动的范式,追求以系统的历史角度研究一系列主题,例如虚构性和审美、文化资本问题、声望与不平等、读者接受。它关注内嵌于社会和历史语境中的文学文本的形式、内容以及社会、文化和认知功能。

CS位于数据科学和数字人文的更大框架内,利用对文本性的文化遗产(HATHI、德语文档案档、维基数据和古登堡等)、原生数字数据(如社会阅读和写作平台)和开放链接数据(如国家图书馆、欧洲数字图书馆[europeana]和维基数据)的大规模数据化及综合处理所带来的新机遇。CS将软件用于文本处理、机器学习、视觉化和统计(例如Python、R、Java),应用实证的、数据的及变量驱动的技术,探索数据集,确立原则,解释文本和语境模式的发生,以及它们和文学潜在的形式——审美、社会和认知功能之间的关系。本文中,这些视角被概括地总结为对文学研究的“形式主义”“社会”和“认知”方法。

与CS最相关的数据驱动学科传统是计量文体学的作者分布研究[2]、语料库语言学[3]、语料库文体学[4]、自然语言处理[5]、人文计算[6]、文化社会学[7]、经验主义读者反应理论[8]、神经认知与计算诗学[9]。尽管CS植根于数据驱动的文本分析,但新出现的研究方向连接了文本建模和接受及语境数据,例如在线书籍评分、人口统计数据、社会和经济指标。

虽然CS有清晰的“经验主义”(empirical)倾向,但显然也包括注重“解释学”过程的方法,例如建构于计算方法内的阐释的、主观的维度。[10]未来CS的挑战之一在于整合这些不同的认知传统,更清楚地关注研究目标和研究方法之间的关系。我们对本领域的远景期望并非力推其为用于研究实践的单一最佳模型,而是在涉及理论和方法论时特意地保持非正统的路径。

和泛泛而言的文学研究一样,[11]CS有强烈的历史倾向,体现于对世界文学有社会性偏向的“远距离阅读”,[12]长期以来反思历史分期问题。[13]将文学文本与语境信息转化为数据,超越单个研究者阅读知识的有限视界,[14]可顾及到对历史框架的可延展的观测。通过这些,利用更长的弧线[15]和这些框架内潜在的新的时期分组,可重新评估现存时间框架并发现新框架。[16]

以下刊物中能找到计量文体学的历史投稿信息,如《数字人文季刊》(Digital Humanities Quarterly)、《人文学科中的数字学术》(Digital Scholarship in the Humanities,旧刊名为Literary and Linguistic Computing)、《文化分析学期刊》(Journal of Cultural Analytics)、《德国数字人文期刊》(ZfdG),还有计算语言学协会(ACL)的会刊《文学计算语言学工作室》(Workshops on Computational Linguistics for Literature),现在称为SIGHUM,即《文化遗产、社会科学、人文与文学计算语言学工作室》(SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature)。另外,2017年,ADHO/数字人文机构联盟(Alliance of Digital Humanities Organizations)组织了特殊兴趣小组(SIG)数字文学文体学(Digital Literary Stylistics, SIGDLS),2021年3月,《计算文学研究基础设施》(Computational Literary Studies Infrastructure)启动,这是欧盟委员会资助的多边刊物。[17]

我们力求对CS这一多样且快速发展的领域内的新兴研究问题做一概述,也涵盖用以解决这些问题的逐步发展的相关理论、概念和方法。我们并不聚焦计量文体学相关的“如何操作”(how-to)问题,而是让读者进入多种多样、富有活力的现有研究方向。尽管有大量手册和资料关注对计算文本分析的应用,[18]但是对于CS如何影响我们理解文学话语,却缺乏良好的概述。

一、面向研究领域图景

CS提供各种方法探索、建造、测试文学理论,涵盖一系列广泛的以文本、语境、作者或读者为导向的方法。我们将概述分为三大范畴:(1)形式主义方法,(2)社会性方法,(3)认知方法。我们强调它们必然相互融入,这很重要。三组中的每种方法亦包含对文学研究的系统性历史方法。确实,关于如何思考文学变迁的历史学方法建模和历史分期问题也是一个可自证其义的新兴领域。[19]

(1)形式主义方法的基本关注来自诗学美学领域,专注理解文学作品的独特特征与结构。这种方法也许会涉及构成文学性[20]、诗性[21]、虚构性[22]、新奇性[23]或者说是文学特质[24]的写作方式。也会关注如何更深地理解类型[25]、年代[26]或作者风格[27]的本质。

(2)相反,社会方法更关注文本外语境,例如不同社区社会实践的类型。这条研究脉络历史悠久,可追溯至赫尔德、斯塔尔夫人及其他19世纪初思想家,她们基本上视文学为历史社会事件。此种倾向建立于布尔迪厄[28]的社会区隔(distinction)和文化资本理论、批评理论[29]和新历史主义对社会语境的权力的强调[30]上,所提出的问题探究“区隔”“正典性”和“声望”背后——进而在社会、政治和经济权力等观念背后——可观察的文本与语境因素。另一相关的研究路线吸取书籍史学家[31]有力的研究,聚焦读者行为和阅读社区。[32]此处的目标不是推断“内在的”文学特质或价值,而是评价不同阅读社区的话语优先权,将其理解为更大的社会互动领域的一部分。

皮埃尔·布尔迪厄(Pierre Bourdieu)

(3)最后,认知方法进入审美和问题学的“认知”分支[33]、文学心理学[34]和读者反应[35]。目标是整合计算文本分析领域的见解和关于读者体现的情感和认知系统的知识。研究考察文学文本的特征及其对读者的影响,用调查、实时数据和其他类型读者反应数据提供的信息来衡量。

这一分类法仅用于启发目的。考虑到本领域不断增长的多样性和复杂性,它或能提供有益框架,向领域内的新生学者和已有学者提供导航向的介绍。不过我们也意识到,它必然遗漏了不同方法之间大量的实际交叠、差异和互联。[36]

本概述的主要目的是强调,对文学研究的计算方法并不规定任何单个理论原理或某类问题。例如,对于“文学特质”这个话题,可能绝大多数为形式主义方法(文化修养高的书籍与文化修养低的书籍对比时,是否共享可观察的文本特征?)、社会方法(文学评价是否作为占优势的民族意识形态功能随历史而变?)或认知方法(对同一文本的读者欣赏,是否会根据它呈现为文化修养“高”“中”或“低”而变化?)。在这一快速扩张的领域,这些框架可以尽可能广泛,但也必然会忽略重要作品。因此我们论及的内容也应被看作未来研究潜在的切入点。在结论里,我们确定了未来研究中优先级的三种基本挑战,关注数据收集、方法论有效性和跨学科整合。

二、形式主义方法

(一)风格

广而言之,风格是“写作方式”或“形式特征的集合”。[37]任何以形式为导向的文学语言应用研究必然有“风格”维度,形式与模式被看作可指示更高水平的文本与语境现象。例如,关于风格的问题往往也会讨论意义,因此也能认为文体模式是认知和情感——读者和作者心中的——潜在的文本关联物。

古典计量文体学(stylometry)已经确定,最常用词语是测量风格类似与差异的最可靠参数,尤其是功能词,如theofin。[38]用于文学文本的“非正典计量文体学”利用这些发现,用大范围的特征(包括实词和单词之外的单元)做实验。在这种情况下,风格作为某些主体、特别是作者的语言特殊性(particularity),被描述性地处理。

作者标志(signal)由文体类型的评估特征[39]、时代[40]和虚构角色标志[41]的研究补充。要建立特殊性(particularity)和明确性(specificity),可以应用多种多样的描述性和多元变量统计技术,通常使用某种类型的参考“规范”(展现其他作者、时期、角色的语料库)。

有一种重要见解认为不存在作为文本对作者、类型、时代或角色无可争辩归属的“风格指纹”,而认为我们以概率决定的可能性展开研究。计量文体学被持续用于文学归属与描述,比如Schöch论述法国犯罪小说;[42]Tuzzi与Cortelazzo论述埃莱娜·费兰特(Elena Ferrante)。[43]目前的研究越来越趋向结合解释性与探索性方法、定量与定性方法。比如Rebora等人探究罗伯特·穆齐尔(Robert Musil)的风格不确定性概念,报告了穆齐尔本人作者身份详细的概率测量,另外还报告了一位迄今未被考虑过的候选者的测量结果。[44]之后描述性的“明确性分析”[45]发现两者之间在风格和内容上的作者差异具有高度可解释性。[46]

以语料库为基础的文学研究领域中,马尔伯格等人处理了“插入式引语”(suspended quotation)的叙事现象,即叙事者打断虚构人物的言语。[47]在评价查尔斯·狄更斯作品的词汇—语法模式时,他们发现了不同文学文本和时间里插入式引语和身体语言表现之间的关系。“用计算机细读”[48]是另一研究潮流,通过小型语料库或单个文学作品的定量对比从事计算微观分析,例如约瑟夫·康纳德《黑暗的心》。[49]

CS还为作者导向的文体批评方法的更新提供了有趣的视角,使用了自动词语分类和心理的意义维度(例如连贯、时间导向和情绪)。在这种心理计量文体学psychostylometry)中,适当预警一下,文学作品作者的心理状态和特性的指标可以通过分析指向情感、身体状态和认知过程的词语而构建。[50]具体来说,功能词(连词、介词、代词)能揭示以语言呈现的心理和社会过程中的诸多内容,包括情感状态、认知复杂性和社会性。[51]数字发生学批评critique génétique)中的一种计量文体学方法有很大潜力,用于数字化[52]或原生数字[53]“档案”(dossiers)。

(二)文学性与虚构性

从亚里士多德起,哲学家和语言学家就一直试图理解语言的交流用途和创造性用途之间的区别(亚里士多德的言说[legein]与行动[poiein]的比较[54])。这方面的研究思考的是:哪些文本特征能和归因于文学语言的效果相关,[55]是否存在能超越时间和空间文学性或虚构性的独特特质。

“文学性”和“虚构性”取决于特定理论,可以重叠、互补、甚至互相包含。例如,CS的先驱俄国形式主义旨在于清晰列出总体的形式清单,即“是什么使特定作品成为文学作品”。[56]这一总括性的“文学性”观念很大程度上和形式主义对虚构性的看法一致,即一种刻意表示的、用于沟通的发明创造,[57]或主要指向“艺术地使用语言”(Hakemulder et al., this volume),具创造性的、突出的语言应用可能抓住读者的注意力。

安德伍德和塞勒斯[58]用一套穿过不同历史时期的大型文本(英语)语料库提供了CS对普遍性“文学性”标志的最早研究之一种,指出文学类型内在的变化使得这一尝试复杂化了。[59]派珀以理论驱动方法处理“虚构性”,查检了两种语言的19世纪至今的几个虚构和非虚构数据集。[60]他测试了三种假说,即“文学负面性”(文学文本不存在能区别的特征);“小说的现实指涉优先级”(过去两个世纪最能定义小说的是它们对现实主义的投入);“文学分期”(描述虚构作品历史演变的最佳方式是不同群集或年代时期)。派珀根据他的发现,拒绝了这三个假说,提出另外三种数据驱动的虚构理论:易读性(legibility)、不变性(immutability)和敏感性(sensibility)理论。需要更多研究来测试这些及其他假说,利用更多数据、涵盖更多语言。

数据驱动的CS方法可以让研究人员测试区分文学/虚构和非文学/非虚构话语的各种文体特征,不仅是将其理论化。研究者通过测试开始更好地了解那些构成文学文本社会区隔和社会功能之基础的美学性质,从规范评估逐渐转移到对社会和人类行为更具描述性的模型。

(三)隐喻

隐喻被定义为“给事物赋予属于另一事物的名字”(亚里士多德《修辞学》,Ch.21,1257b1-30),被广泛认为是文学的标志性特征。隐喻语言(包括习语、拟人和明喻)满足了生动意象、创造性的再语境化、对主观思想的表现的需求。[61]隐喻的语言呼应比喻过程,后者构成了日常话语[62]和文学[63]中人类的经验领域。

用计算机鉴定和阐释隐喻,这两个关键任务在方法论上仍构成很大挑战,因此CS围绕方法论的研究问题,同时实施形式体系内的隐喻思想和语言的模型。具体来说,自然语言处理(NLP)用来自动探测隐喻表达,也用于以下建模:世界的知识——根据概念隐喻理论它必然被体现——如何在计算的隐喻处理系统中表现。[64]

可靠的探测,无论手工或自动,[65]都是回答关于各种文体类型、作者和时间[66]中隐喻分布问题的前提条件。[67]经验主义研究强调过隐喻的无所不在,CS研究已开始提出关于文学隐喻具体性的问题,涉及分布、语义和句法特征、美学功能。

多斯特检验了VUAMC[68]的小说样本,全方位覆盖当代英语中传统及创造性隐喻的使用,发现平均隐喻使用率为11%,低于新闻话语,但高于自发谈话。[69]赫尔曼细查德语小说(1880—1926)开头部分的样本,提出平均14%的隐喻使用,现实主义和“严肃高雅”(high-brow)文本显示更少隐喻,流行和“时髦”文本则显示超过平均的倾向。[70]时间差异方面,她指出1920年后的文本更少隐喻。安德森等人[71]在完成《英国历史同义词词典》(English Historical Thesaurus)时,更概括地提出特定隐喻在数百年间如何发展,回应哪些文本外部因素,哪些经验范畴在隐喻表达中最突出。[72]赫尔曼和梅瑟利研究在线书评中阅读程序的概念化时,在130万评论语料库[73]中发现了“阅读”与“食物”[74]、“阅读”与“移动”(motion)概念范畴之间底层映射的证据。[75]

隐喻“常规与异常”[76]或“刻意与非刻意使用”之间的差异[77]是目前最有前景的研究路径之一,在矢量空间模式中明显有很大潜力,完美代表了特定话语范围(例如德国现代主义)。迄今为止,以VUAMC小说样本为基础的语料库—文体研究显示出常规的、非创作型隐喻强大的主导地位,往往用拟人的形式。[78]相关的还有“隐喻恰当性”、跨越类型(cutting cross genre)、读者反应等问题,后者包括对读者判断的计算建模。[79]雅各布斯和欣德结合评分和计量文体学,发现“文学”隐喻(由著名诗人创造)和“非文学”隐喻(由非职业作者创造)之间的结构差异,包括响度(sonority)、长度和惊奇值(surprisal value)。[80]

CS对隐喻的研究很大程度上依赖NLP和语料库语言学的成果、自动探测的进展和庞大的数据集注释。同时也非常需要对特定文学类型中隐喻的专门研究,以特定范畴的语料库回答关于频率分布的问题,还涉及隐喻的种类和功能,特别是其含糊/一词多义。最有前途的是专注识别常规性/新奇性的项目,以及整合读者评级和CS分析的项目,它们将本领域推进到一种关于隐喻如何激活读者大脑的数据驱动理论。

(四)叙事学与剧作理论

对叙事和戏剧结构的计算研究领域正在快速发展,[81]尤其关注角色塑造、角色网络、叙事视角、叙事时间[82]和情节结构[83]。计算语言学和信息研究中的相关工作关注“事件探测”(event detection),本领域越来越被视为更大的叙事学概念的是诸如“情节”和“视角”等要素。西姆斯等人已开发出新方法,探测文学叙事中“真实”事件与假设性事件的对比,涉及叙事中发生的事件,并表明非现实性(irreality)与文学声望强相关。[84]

自然语言处理中,实体识别(entity recognition)、依存性分析(dependency parsing)、同指消解(co-reference resolution)的进展[85]开启了对文学角色富于创造力的研究。派珀[86]利用BookNLP资源[87],发现小说中的文学角色相比其他种类名词,语言上更为同质。因此角色的独特性——而不是作为更丰富体现描述的一个功能——也许要么来源于其言语,要么来自读者的推断。在同一研究中,派珀表明,19世纪女性小说家创造的女主角显示出明显更高的“内在性”(interiority)语义水平,超过其他类型角色,为女性作家在塑造文学类型中的作用提出了支持性证明。郑观察到对男性和女性角色不同身体变化描写的语义联系,为我们理解文学角色性别化提供了更细微的描绘。[88]同样,雅各布斯的SentiArt工具基于情感分析的新进展,计算了虚构角色的情感与性格侧面描写,如哈利·波特和伏地魔,并以分类(generic)人物特征——如“外向”“情绪稳定”——为基础,确定角色身份。[89]

虽然这一研究大多将角色塑造视为描述形式,但角色也被其交流与思考(mentalize)的方式关键性地塑造,这也推动了对“叙事视角”问题的探讨。CS还处理言语、思想和写作表现如何构建虚构角色,塑造叙事者—角色关系,并描述叙事者的立场特征。[90]巴洛斯检视过人物的个人语型idiolects),这是文本内说话的个体独特的文体模式。[91]瑟米诺和肖特最早在虚构和非虚构类型中实施并对比“视角”,[92]这一框架由巴斯[93]应用并由布鲁纳[94]扩展到德语中。霍伊特·朗和苏真以及布鲁克等人探究了用于研究“自由间接话语”的计算方法。[95]

关于角色对话的问题受到越来越多的关注:小说[96]和戏剧[97]研究提出虚构对话和面对面交流之间的相似性、叙事篇章和更具信息性的叙事散文之间的相似性。慕兹等人提出过许多进一步研究的潜力途径,其中提到通过更高层次的抽象语法特征来看,自由间接话语是对话的一种叙事升华。[98]

CS还研究社会网络在形塑文学话语上的作用。阿达努伊和斯伯莱德认为,类型有一“社会网络”(social network)特征,在预测其文体类型分类方面和“语义学”特征一样强。[99]其他研究评估了虚构与真实世界网络之间的相似性、[100]小说中城镇与农村背景的比较、[101]三个世纪来英语戏剧类型的进化、[102]法国古典戏剧中的作者身份。[103]特里克做过德国戏剧的网络分析,[104]而费舍尔和斯克林肯则阐明了俄国小说和戏剧中艺术的状态。[105]唐格林尼利用网络模型分析了“传奇”的叙事结构,涉及范围从民间传说到当代阴谋论。[106]需要大量的研究工作来更深刻地理解并确定文学文本中“关系”和“互动”作为构建社会网络手段的本质。

(五)叙事时间与空间

时间性是叙事的限定性特征之一,因为它们是事件顺序的表现。[107]叙事理论认为,时间既是叙事世界的维度,也是描述故事如何叙述的分析范畴:一般来说,时间分为“故事时间”(叙事世界的时间维度)、“话语时间”(讲述的时间,即用来叙述虚构事件的时间,包括顺序、持续和频率等形态)和“叙事时间”(叙事行为的时间性,描述叙事声音的时空位置)。

安德伍德[108]对小说中故事时间的模型表明,过去三个世纪,作为故事时间功能的话语事件的持续时长呈戏剧性下降。萨挈斯和派珀、[109]梅斯特、[110]梅斯特和斯彻努斯[111]都为时间的多维建模提出了理论框架。池尾研究了当代小说中动词时态的分布。[112]考虑到目前使用的以文本为基础的多数模型将文学视为静态单元(不论是作为词袋、或是矢量),未来需要许多研究来更好地理解将故事时间从话语时间和叙事时间中区分开所带来的影响。

文学与“空间”关系的计算研究一直蓬勃发展。[113]其支柱在于可进入的地理信息系统(GIS),它整合了数据库和视觉化,考虑到文学的文本及语境维度的空间模型。关于文学文本的虚构景观的开拓性项目包括皮阿迪的《欧洲文学地图》[114]以及库珀和格里高利的《绘制湖泊》[115]。威尔金斯阐明了研究人员如何整合社会人口统计数据和文学数据,挑战长期以来与美国内战后果有关的美国小说中地理表现的理论。[116]斯坦福文学实验室的研究报告了19世纪英国小说中“伦敦”与日俱增的重要性,对虚构作品中的地图绘制的探究将伦敦城镇景观内的情感同社会权力关系联系起来。[117]

随着未来研究更充分地细致区分关于世界建造、叙事、文本外时间和空间的相互关系问题,文学的时间与空间维度的计算建模有巨大的研究潜力。

(六)声音与视觉

文学研究、语言学和书籍史的研究历来强调文本的视觉与语音维度的重要性,以及其操演性(performative)与经验性的关联影响。在词语和字母之外,文本还包括图形组件(空格、字体、插图等)和基本语音层面,它们通过单词组件及其发音被编码。该领域的新研究正在着手处理这些文学话语的文本外维度。

“声音”在其最宽泛的意义上可理解为由多种语言层面构成,这些层面结合各种形式的语义特征,[118]在语言的韵律学和语音学层次都发挥特别作用。例证包括与音素(phonemes)和调素(tonemes)等核心单元有关的语音列表、以音韵学为基础的文体技巧如韵脚和格律、[119]其他音韵再现(recurrence)的修辞格,[120]包括单个语音之上的声学的、超切分特征。有些特征有规律地互动(如诗歌节奏和格律)或频繁共现,彼此影响(例如尾韵和格律)。

所有不同层面构成了文学文本特定的“旋律”,也就是其音乐特征。这些层面继而又引起对声音(抽象—语音学的、声音—韵律学的)的不同(再)呈现,这些呈现也结合那些形塑了它们的不同理解模式(默读、朗读、倾听文学文本朗读)被研究过。尽管话语的音调和节奏维度不限于诗歌,但它们是一种类型的标志,这一类型引来一批独特的研究问题(结合句法和语义开放性的作诗法,高密度的形式技艺),能够一般性地区分它和散文。[121]

CS领域值得注意的新研究是针对文本的听觉和操演性质。卡茨玛提供了一种计量文体学方法,[122]克利蒙和麦克劳福林则介绍了一些方法,用来评估文学朗读音频文件中的观众行为,比如鼓掌。[123]麦克阿瑟等人已有新进展,研究了用于语音文本的十二种可能的声学测量方法,测试“诗人嗓音”(诗歌朗读常见的一种熟悉的表演风格)的存在。[124]随着文学文本音频文件的增多(有声书和诗歌朗读),大量新研究得以探讨文本的语义学与声学特征间的关系。

书籍史家和媒体研究从业人员一直强调文本物质性质的重要性,可包括文本的视觉元素,如装订、插图或封面,也可包括实体格式,文学以此形式流传,还有口头格式,如诗歌朗读或有声书。意象分析image analytics)领域或许还很新,但已在CS研究中发挥作用。豪斯顿和尼尔的研究首次探索了空格在文体类型区分中的意义,[125]法伊夫和格为报纸插图研究方向开发了初始指标(metrics),[126]派珀等人应用页面图像分析(DIA)研究科学图形实践的演化,[127]这些惯例支撑了18世纪以来的真理宣称,其中包括了脚注、表格、图表、插图等。[128]

这些创新方法超越了对文本语言学基础的基本认识,开始思考阅读和文本意义的其他物质维度。将听觉和视觉层面结合,开启了重要的新路径,可研究文学的各种媒介(medial)形式及其读者接受。

三、社会方法

(一)不平等、偏见与身份

对大数量文本及其历史、社会表征性的研究已蕴涵着一个社会维度,而CS也开始开发明显有社会倾向的研究框架。表征(representation)与社会权力间关系的关键领域之一便是关于不平等的研究。CS领域的研究源于文化研究和伯明翰学派[129]的理论工作,当下正在探索文学及其他文化档案材料中表现的不平等和偏见。这一工作采用来自于斯皮瓦克研究的双层“表征”理论,[130]研究人员探索行为主体(作者、人物角色、出版社、编辑)层次上的表征偏见和表征偏见的形式层面(风格、语义学、传统主题等[131])。例如,安德伍德等人研究表明,20世纪小说作者中女性作者大幅减少(第一种表现形式),但男女角色描写方式的性质差异也在减少,根据表现的第二层次,显示出性别平等在提升。[132]不过,克雷瑟和派珀的后续研究也表明,各种类型和当代小说读者群中,男性角色如何持续地在数量上超过女性角色,说明长期的性别等级制度依然存在。[133]另外,他们使用随机模型说明了虚构的社会网络中强烈的不合常规的各种规范。塔特洛克等人利用知名的印第安纳蒙西(Muncie)数据集的图书馆记录,考察了性别不同的阅读习惯,以确定跨越读者群的文学特质。[134]

研究者在性别分析之外也研究与文学及文化生产相关的种族问题。苏真等人用了“文本再利用算法”(text re-use algorithms)进一步理解文学中不同种族群体进行圣经引用的方式。[135]阿尔吉—休伊特等人用词语搭配分析研究了一组18,000本美国小说中种族诋毁案例的激增。[136]伊文思和威尔金斯表明,不同族裔出身的作家如何以不同方式构建关于英国性(Britishness)的想象空间,[137]李等人则结合主题建模和出版社网络,重新思考莎士比亚作品中的种族问题。[138]

这些更常应用的方法之外,还有一种迅速发展的研究理论方向,后者的目标是超越二进制建模的限制,设想更细致的方法来研究种族、性别和性的议题。[139]未来的研究将越来越多地将这两个领域带入彼此的对话中,以便构建与身份、偏见和不平等问题有关的更灵活的模型。

(二)声望(prestige)与文学特质

另一条研究路径考察读者、职业评论家和机构对文学的评价,同时关注阅读团体和文学机构间的差异。此处目标不是要推断“内在的”文学特质或价值体系,而是要评价不同阅读团体的话语优先权,将其理解为更大的社会互动领域的一部分。这条研究路径很大程度上由布尔迪厄关于社会区隔和文化资本理论所形塑。[140]如果区隔定义了其《区隔》一书的成功及社会意义,那么此种区隔的可观测文本和语境因素为何?问题延伸至更大的关于经典性的议题、声望的议题,以至社会、政治和经济权力议题。在这压倒性的社会性视角中,有些方法有更强的“形式主义”倾向,更强烈地关注“文学特质”的文本关联。[141]

派珀和波特兰斯确定了当代小说中获奖作品和畅销作品之间文体上的明显差异,[142]这很显然是围绕关于“时间”的问题,怀旧的时间框架在高雅文化产品中更普遍。[143]娅尼蒂斯等人的类似研究对比了德语中的高雅文学和低俗文学,发现有证据表明在通俗文本中存在比之前设想的更大的类型内部差异。[144]安德伍德检视了19世纪中期到20世纪中期“文学声望”的历史变化,观察专业(“精英”)文学期刊,发现语义可预测性越来越提升,以此可判断一部小说或诗集是否会被评论涵盖(从而赋予“声望”)。[145]他还发现自19世纪中期开始,有声望和没声望的写作之间文体分层的增加。阿什克等人则指出句子复杂性和文学成功之间的正相关关系。[146]类似地,范·克兰恩伯格等人考察了荷兰语小说文本特征和读者评价之间的关联,发现更高的语义复杂性会伴有读者更高的评价。[147]高和尤拉夫斯基针对美国英文诗歌,公布了“高品质”诗歌的关键指标,即其所提到具体物体的频率。[148]雅各布斯延续了这一研究,给出了一份决定文学特质的综合性因素清单,包括词汇熟悉度、句子嵌入、连贯(coherence)和一致(cohesion)。[149]

赫尔曼和梅瑟利使用注释程序和基于文学价值分类学的、以规则为基础的方法,考察了在线外行书评的“文学特质”,发现很大比例效果导向的“享乐主义”价值观,但也发现了倾向于形式的价值观,后者关注写作风格与创作。[150]

(三)社会性阅读

穆雷[151]在“数字文学空间”(Digital Literary Sphere)当中吸收了达恩顿[152]较早的书籍史研究,提出在较大的数字生态系统中研究文学的重要性,其中一个重要部分就是可在网上捕捉的读者反应空间。回到费什的研究,指导性的理论框架在于不同读者群体共同构建起文本的意义。[153]罗博拉等人展现了“数字社会阅读”(和写作)研究的最高水平,研究了至少十种主要范畴中的互动,包括阅读导向的、机构的和社区导向的研究。[154]该方向的研究工作包括分析好读网(Goodreads)和亚马逊等不同平台的不同评分行为;[155]读者评论中的情绪和所反馈的文本之间的关系;[156]通过书籍的学术性引用和好读网读者的“上架”行为,考察与“普通”读者相比、学术读者不同的关注结构;[157]或者学院派选集和谷歌知识图谱之间的文学正典建构。[158]利德尔和范·大伦—欧斯卡姆探索了读者自我报告之间的差异水平,发现结果两者间的重合度高得多,这是理论没有预测到的。[159]该方向另一有潜力的研究领域使用“文本再利用算法”来理解文学作品如何被专业或外行读者引用,[160]以及19世纪美国期刊中的文本社会再利用。[161]博德的研究为研究澳大利亚期刊文本中的社会流通提供了重要的新资源。[162]

社会阅读实践研究的重大挑战在于进入企业数据,用来展现和组织数据的各种算法也难以获取。另一挑战在于不同平台中由人群偏见导致的线上行为的普遍性。不过,这一路径许诺要超越虚构的“理性型”读者,后者往往被传统文学研究用于研究实际的读者行为。

以上文本分析方法之外,还有一种生机勃勃的经验主义读者研究传统,研究读者对文学文本的情感评价。辛德勒等概述了一种问卷为基础的新方法,确定了审美上引发的情感,[163]门宁豪等提出了关于审美情感定义的综述。[164]未来研究将会关注发展更具流动性的“品质”定义,对读者情感反应的研究将整合到大规模文本分析中。我们在结论部分将说明,这是未来CS研究最有希望的路径之一。

四、认知方法

CS领域中的一种研究从认知角度研究读者同文本的关系。上述例子是对鉴赏和声望的研究,由专业读者以书评家(re-viewers)或评奖委员会的形式做判断,也有外行读者对文本的认知与情感反应。研究工作通过读者“自然地”留在网上的文本踪迹(例如在类似好读网或者亚马逊等平台上)或控制下实验研究得到的资料考察读者反应。此处“读者”及其“反应”在读者类型(职业、业余、能力差异)、社会元数据(图书馆或专业分类系统)、文本反应(陈词证据与书评)、认知/情感反应(评分、视线追踪、大脑成像等)等层次上建模。

认知文学研究(CLS)代表了对计量文体学的重要补充维度。认知文学研究的目标是对系统性影响了文学体验的刺激物(也就是文学)和语境(即读者性格、阅读境况、社会历史环境)诸方面提出有效的一般描述。[165]在最广意义上,文学体验包含可直接或间接测量的伴随阅读行动的反应,[166]包括身体(例如心率或皮肤电变化)、神经认知(例如大脑活动调节)和行为反应(例如口头报告、评分、或眼球运动)。[167]大量来自伯克、[168]史洛特和雅各布斯[169]的评论认为,除了这样的充分描述,对文学的科学研究必须旨在解释、预测和控制(即系统性地调节)文学体验。如此方有可能回答何为文学、我们为何投向文学阅读这样的问题。

CLS跟进由古典和现代诗学研究中形成的多为理论性的研究,[170]在专业期刊上(如PoeticsPoetics TodaySSOLLanguage and LiteratureStyle)发表、提出充分证据证明用于处理这一丰富复杂的研究领域的理论视角、问题、假说/模型和方法中有丰富的多样性。[171]最近有些论文试着在明显的创造性混乱中置入秩序,方法是(1)提出分类方案,根据使用的是直接测量或是间接测量、在线或线下方法、文本操纵来组织CLS研究;[172](2)提出综合性理论框架,称为文学阅读的“神经认知诗学模型”[173],让研究者可以生成涉及上述所有文学体验层面的、可检验的预测。这些尝试是否有助于建立(最低程度上的)方法论标准,促进未来CLS发展,尚在未知之数。

CLS面临的主要挑战可分为两个范畴。第一个涉及刺激物材料。“文学”这个对象范围庞杂,任何想要进入该领域的人都可能在调查阶段便已迷路。当然,也可以选择这个星球上某处发表过的任何文学文本、散文或诗歌,由个人兴趣或手头研究项目的目标引导。雅各布斯列出了最佳做法蓝图[174]:(1)可公开获得的、生态上可行的、被普遍接受的文学测试资料的相关集合(例如154首莎士比亚十四行诗);(2)与这些材料有关而恰当的开放获取的数据库/训练语料库(例如古登堡图书馆英文语料库[175]);(3)高阶的定性—定量叙事分析(Q2NA)和机器学习工具,[176]用于提取材料特征;[177](4)开放获取的读者反应数据库,理想中至少提供一些之前收集的、研究者感兴趣的材料数据。随着这些领域的成熟,我们预期有重要的合作机遇,统筹对读者认知和情感反应的大规模文本分析,以更好地处理文学意义的问题。

五、前景

尽管计量文体学的广阔领域可提出内容广泛的问题类型,但皆汇集于一系列共享的方法和关注点。文化遗产数据有很大潜力,通过反向数字化(retrodigitization)补充数字档案。[178]这种数据管理类型或许最终导向对人类历史上小说进化和故事叙述大规模的、人类学意义上的、精确的比较研究。另一方面,原生数字文学话语包括新类型的大数据,例如外行读者的书评、电子阅读器实时反应数据,或以爱好者为基础的小说写作,全都要求学术上精确、数据驱动的理论化。

总的来说,本领域集中于文本分析、数据表征、机器学习和视觉化的共享技术。和数据科学中其他领域一样,这些方法也带来诸多挑战,将会形塑将来的研究。这些挑战可分为四种范畴。

(一)理论

来自领域外的、对CS的批评一般指责它对文学或语言学理论化缺乏综合投入。[179]它盯住“简单的词语计数”作为得到结论的主要方法,缩减了其原始资料丰富的复杂性。许多新研究已经开始关注理论与模型建构之间的关系,[180]特别强调要深入理解目前计算方法所接近(approximate)的概念性和经验性框架。[181]未来的工作将聚焦两个重要而相关的议题:第一个议题在下述“构建有效性”——也就是特定模型能够充分表现理论建构的能力——一节中描述。可以做更多研究详述模型中表现的建构(以及那些没有得到表现的)。第二,我们看到一系列有力的未来研究,更明显地关注“理论测试”,也就是评估理论上有关联的建构解释真实世界现象(“文本行为”)的有效性。有可能实现对诸多现存文学理论的系统性质询,开发新的理论模型,引导数据结构的形成。总体看,未来研究工作将努力弥合(计算)文学研究数据密集端和理论密集端之间的缺口。

(二)数据

收集与代表性。研究者首先更大量地收集数据,从这些数据集得出可概括的推断,这时就出现了与计量文体学数据来源有关的两个根本问题。第一个是数据入口,特别是构建、整合、重启[182]古典、中世纪、早期现代、18世纪后的文本数据库(包括跨文化的资料)的努力。这就需要克服眼下阻碍数据获得和共享的版权限制。[183]还有谈判条件——包括授权——出于研究目的收集、改进、分享有版权的资料。另一重要议题是格式,未来研究可能选择更灵活的元数据形式,包括语义网络技术(RDF、RDFS、OWL),它或能与XML技术结合。[184]

第二个更为理论上的问题涉及样本代表性。[185]到目前为止的研究几乎都采用非概率样本(即便利样本),它对样本外的数据结论的普遍性有很强的限制。[186]赫尔曼和劳尔认为这很大程度上是因为CS里的概率性采样很有挑战,[187]如果不是不可能的话,其原因在于缺乏涵盖性记录(例如所有19世纪小说名单,或者横跨一段时间和语言范围的全部图书馆借书记录;有关测量具体文本总体[population]的尝试[188])。然而,使用根据不同的理论精确标准、可能来源不同的多种样本,是为任何单一评估的代表性赢得信心的一种方法。采样技术的许多种类中,很多评估都是有效的。[189]类似地,数据科学家也开始表明如何使用分析步骤,克服便利样本的局限性。[190]但依然需要极大工作量才能进一步理解可得到的不同语言与年代数据集的代表性。

元数据与标注。另一主要障碍是元数据及标注的缺乏和/或可信性。尽管相比之前,现在有多得多的数字文本可用,但却由于缺乏持续有效的元数据,令分析大幅受阻(例如作者名、出版日期、类型分类)。另外,我们依然缺乏标记做法(例如将文本缩为词频)之外对文本数据的丰富理解。

需要更多大量工作来推进文本的书目分类,例如作者元数据,比如国籍、性别和种族以及类型分类。安德伍德的Hathi Trust数据库的英文小说、诗歌和剧本分类工作是这一重要工作的典范。[191]德国国家图书馆为了促进类似研究,正在扩大外部入口,由此可进入完整的在线元数据,[192]并允许对其原生数字文本馆藏的全部文本做在线研究。[193]一般来说还需要图书馆之外元数据的涵盖性、可持续的入口。例如,可通过链接的开放数据,将语义网技术用于灵活的数据表现和入口来实现。[194]

为了促进文学收藏元数据的丰富性,需要更多研究将大规模数据标注标准化、有效化。包括bookNLP[195]在内的诸多自动化系统有助于词汇和短语层次的标注文本,用于词类、依赖关系、命名实体和超感范畴(supersense categories)(如动词和主语类型)。类似功能由CLARIN、DARIAH等基础设施、CWB[196]或TXM[197]等工具来实现。未来研究将开发出强健的系统,标记越来越复杂的特征,如隐喻、[198]叙事层面、时间框架、[199]因果和意外模型,并将不同语言中现有特征的标记标准化。对人类与机器标记整合的最佳实践研究,将是本领域前进的必要内容。[200]

(三)方法论的有效性

工具、仪器、方法程序(procedure)的有效性和标准化是未来的另一关键领域。首先,它研究的是“工具批评”(tool criticism),可理解为“为具体任务的既定数字工具的适用性评价”。[201]长久以来社会科学和心理学都将其称为“构建有效性”,[202]即测量与其意图表现的理论建构之间的可适性。尽管这些毗邻领域中已有数十年的探讨,直至最近,计量文体学却显示出从工具转向概念的趋势。但CS内部的新论述越来越强调“理论优先”方法,关注开发批判性建构,接着会为此开发合适的工具并测量。因此工具批评的目标是“进一步理解任何工具偏差对具体任务的影响,而非改善工具性能”。[203]

另一重要因素是人类判断力的作用。研究者的“自由度”会强烈影响研究结果及结论。[204]CS作为新兴领域,应该开发程序编纂最佳做法,用于评估概念和模型的稳定性及有效性,以及从中得到结论的恰当性。例如在广泛应用的主题建模方法中,如果没有思考参数对模型结果的影响,[205]具体实施差异很大。数据视觉化方面,用于生成视觉化的协议里能看到类似差异,应用之处也有差异。[206]未来研究应关注开创标准化程序,用于数字处理和报告,这样其他研究者才能评价研究者的选择。[207]

(四)学科整合

最后,我们想强调在更广阔的CS领域中进一步实现学科整合的目标。我们将计量文体学看作其他数据科学研究不可或缺的部分,例如数字社会学、计量语言学和信息研究。建立并维护跨学科的合作将会促进这些范畴内专门技术的重要迁移,有助于建立关于最佳做法的共识,避免对旧争论做不必要的重复。另一方面,我们认为有必要进一步整合CS和经验主义实验科学,如社会心理学和神经科学。文本并非孤立存在,而是通过读者参与和机构协助传播文本,被社会性地启动。未来研究将要进一步了解文本特质——即CS主要关注的内容——和读者反应之间的关系。这些领域的整合展现出令人兴奋的机遇,将获得更多有关文学话语意义性(meaningfulness)的洞见。[208]

————————————————————————————————————————————–

Computational Stylistics

J. Berenike Herrmann, Arthur M. Jacobs, Andrew Piper

Abstract: Computational Stylistics (CS) is a field of enquiry that examines the forms, social embedding, and the aesthetic potential of literary texts by means of computational and statistical methods. Operating on larger data sets with more transparent methodologies, CS offers literary studies new scales of observation and new methods of interpretation, to test existing theories and form new ones. As in many data-driven fields, methods range across exploratory, explanatory, and predictive modeling, with important debates addressing the affordances and limitations of each. From its multiple heritages in authorship attribution, stylistics, and natural language processing, CS has evolved to tackle ever more ambitious theoretical questions, including style, genre, and epoch; literary topoi, plot, and character networks; narrative perspective, figure characterization, and emotion; gender, race, and social status; canonicity, literariness, and textual quality; and cognitive representations of word beauty, metaphor, and rhyme. Situated within the data sciences, CS comprises distinct knowledge domains, in which the affordances of the digital (method, medium) and the statistical interact with the epistemic to produce new knowledge at the analytic levels of “text,” “context,” “author,” and “reader.”

Keywords: Computational Stylistics; Data Science; Literary Study; Corpus

————————————————————————————————————————————–

(编辑:姜文涛)

注释:

[1]“计量文体学”这个术语也具体指向高阶的计量文体学(stylometry),该研究群体最初由 Maciej Eder, Mike Kestemont和Jan Rybicki发起,https://computationalstylistics.github.io/。本文持更广泛的定义,包含用计算方法对文学现象的不同研究路径,重点放在对文学话语的读者维度的建模上。参见:Maciej Eder, Mike Kestemont, Jan Rybicki, “Stylometry with R: A Package for Computational Text Analysis,”The R Journal, vol. 8, no. 1, 2016, pp. 107-121, https://journal.r-project.org/archive/2016-1/ eder-rybicki-kestemont.pdf。

[2]P. Juola, “Authorship Attribution,”Foundations and Trends in Information Retrieval, vol. 1, no. 3, 2006, pp. 233-334, https://doi.org/10.1561/1500000005. 另见计量文体学书目https://www.zotero.org/groups/643516/stylometry_bibliography。

[3]J. R. Firth, “A Synopsis of Linguistic Theory, 1930-1955,” ed. Philological Society (Great Britain), Studies in Linguistic Analysis, London: Blackwell, pp. 1-32; J. M. Sinclair, Corpus, Concordance, Collocation, Oxford: Oxford University Press, 1991.

[4]M. Mahlberg, Corpus Stylistics and Dickens’s Fiction, London: Routledge, 2013, https://doi.org/10.4324/9780203076088.

[5]C. D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing (10th ed.), Cambridge: MIT Press, 2010.

[6]S. Hockey, “The History of Humanities Computing,” eds. S. Schreibman, R. Siemens, and J. Unsworth, A Companion to Digital Humanities, London: Blackwell, 2004, pp. 3-19.

[7]T. Underwood, “A Geneaology of Distant Reading,”Digital Humanities Quarterly, vol. 11, no. 2, 2017,https://www.digitalhumanities.org/dhq/vol/11/2/000317/000317.html.

[8]F. Hakemulder, W. Peer van, “Empirical stylistics,” ed. V. Sotirova, The Bloomsbury Companion to Stylistics, Continuum, New York: Bloomsbury Publishing, 2015.

[9]A. M. Jacobs, “The Scientific Study of Literary Experience: Sampling the State of the Art,”Scientific Study of Literature, vol. 5, no. 2, 2015, pp. 139-170, https://doi.org/10.1075/ssol.5.2.01jac; Jacobs, “Quantifying the Beauty of Words: A Neurocognitive Poetics Perspective,”Frontiers in Human Neuroscience, vol. 11, 2017, p. 622, https://doi.org/10.3389/fnhum.2017.00622; A. M. Jacobs, “The Gutenberg English Poetry Corpus: Exemplary Quantitative Narrative Analyses,”Frontiers in Digital Humanities, vol. 5, no. 5, 2018, https://doi.org/10.3389/fdigh.2018.00005; A. M. Jacobs, “(Neuro-)Cognitive Poetics and Computational Stylistics,”Scientific Study of Literature, vol. 8, no. 1, 2018, pp. 165-208, https://doi.org/10.1075/ssol.18002.jac; A. M. Jacobs, “Sentiment Analysis for Words and Fiction Characters from the Perspective of Computational (Neuro-)Poetics,”Frontiers in Robotics and AI, vol. 6, no. 53, 2019, https://doi.org/ 10.3389/frobt.2019.00053; I. Schindler et al., “Measuring Aesthetic Emotions: A Review of the Literature and a New Assessment Tool,”PLOS ONE, vol. 12, no. 6, 2017, https://doi.org/10.1371/journal.pone.0178899.

[10]J. B. Herrmann, Digital Literary Stylistics. Enhanced Humanities (Unpublished Habilitationsschrift), Faculty of Humanities and Social Sciences, University of Basel, 2019.

[11]E. Hayot, “Against Historicist Fundamentalism,”PMLA, vol. 131, no. 5, 2016, pp. 1414-1422, https://doi. org/10.1632/pmla.2016.131.5.1414; H. R. Jauß, Literaturgeschichte als Provokation, Berlin: Suhrkamp, 1970.

[12]F. Moretti, “Conjectures on World Literature,”New Left Review, vol.1, 2000, pp. 54-68, https://newleftreview.org/issues/II1/articles/franco-moretti-conjectures-on-world-literature.

[13]W. C. Dimock, Through Other Continents: American Literature Across Deep Time, Princeton: Princeton University Press, 2006; T. Underwood, Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies, Palo Alto: Stanford University Press, 2013, http://www.sup.org/books/title/?id=22262.

[14]G. Crane, “What do You do with a Million Books?,”D-Lib Magazine, vol. 12, no. 3, 2006, https://doi.org/ 10.1045/march2006-crane.

[15]T. Underwood, Distant Horizons: Digital Evidence and Literary Change, Chicago: University of Chicago Press, 2019.

[16]J. M. Hughes et al., “Quantitative Patterns of Stylistic Influence in the Evolution of Literature,”Proceedings of the National Academy of Sciences, vol. 109, no. 20, 2012, pp. 7682-7686, https://doi.org/10.1073/pnas.1115407109.

[17]http://dls.hypotheseorg/; https://cordis.europa.eu/project/id/101004984.

[18]T. Arnold, L. Tilton, Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text, London: Springer, 2015, https://doi.org/10.1007/978-3-319-20702-5; R. H. Baayen, Analyzing Linguistic Data. A Practical Introduction to Statistics Using R, Cambridge: Cambridge University Press, 2008, https://doi. org/10.1017/CBO9780511801686; S. T. Gries, Statistics for Linguistics with R: A Practical Introduction, Berlin: De Gruyter, 2013, https://doi.org/10.1515/9783110307474; M. L. Jockers, Text Analysis with R for Students of Literature, London: Springer, 2014, https://doi.org/ 10.1007/978-3-319-03164-4; F. Karsdorp et al., Humanities Data Analysis, Princeton: Princeton University Press, 2021; A. Piper, The Fish and the Painting, n. d., https://r4thehumanities.home.blog/; J. Silge, D. Robinson, Text Mining with R: A Tidy Approach (First edition), Sebastopol: O’Reilly, 2017; J. Silge, D. Robinson, Text Mining with R. A Tidy Approach (updated online edition), 2020, https://www.tidytextmining.com/.

[19]F. Jannidis, G. Lauer, “Burrows’s Delta and its Use in German Literary History,” eds. M. Erlin, L. Tatlock, Distant Readings: Topologies of German Culture in the Long Nineteenth Century, New York: Boydell & Brewer, 2014, pp. 29-54, www.jstor.org/stable/10.7722/j. ctt5vj848.5; A. Piper, Enumerations: Data and Literary Study, Chicago: University of Chicago Press, 2018, pp. 94-117, https:// www.journals.uchicago. edu/doi/10.1086/703888; T. Underwood, Distant Horizons, 2019.

[20]R. Jakobson, “Language in Literature,” eds. K. Pomorska, S. Rudy, Language in Literature, Harvard: Harvard University Press, 1987.

[21]M. Salgaro, “Historical Introduction to the Special Issue on Literariness,”Scientific Study of Literature, vol. 8, no. 1, 2018, pp. 5-17, https://doi.org/10.1075/ssol.00005.sal.

[22]A. Piper, Enumerations: Data and Literary Study, pp. 94-117.

[23]L. McGrath et al., “Measuring Modernist Novelty,”Journal of Cultural Analytics, 2018, https://doi. org/10.22148/16.027; D. Liddle, “Could Fiction have an Information History? Statistical Probability and the Rise of the Novel,”Journal of Cultural Analytics, 2019, https://doi.org/10.22148/16.033.

[24]K. Van Dalen-Oskam, “Prehistory of the riddle,” (“The Riddle of Literary Quality: The Search for Conventions of Literariness,” Transl. of: “The Riddle of Literary Quality. Op Zoek Naar Conventies Van literariteit,”) Vooys: Tijdschrift Voor Letteren, vol. 32, no. 3, 2014, pp. 25-33, http://literary-quality.huygens. knaw.nl/?p=537#more-537; W. van Peer, ed. “The Quality of Literature: Linguistic Studies in Literary Evaluation,”Linguistic approaches to literature (LAL), vol. 4, Amsterdam: John Benjamins, 2008, https://doi.org/10.1075/lal.4.

[25]T. Underwood, “The Life Cycles of Genres,”Journal of Cultural Analytics, 2016,https://doi.org/10.22148/16.005; F. Jannidis et al., “Makroanalytische Untersuchung von Heftromanen,” DHd 2019,Digital Humanities: Multimedial & Multimodal. Konferenzabstracts, 2019, pp. 167-172, https://10.5281/ zenodo.2596095; M. Wilkens, “Genre, Computation, and the Varieties of Twentieth-Century U. S. Fiction,”Journal of Cultural Analytics, 2016, https://doi.org/10.22148/16.009.

[26]F. Jannidis, G. Lauer, “Burrows’s Delta and its Use,” pp. 29-54.

[27]J. F. Burrows, Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method, Oxford: Oxford University Press, 1987, https://www.jstor.org/stable/27710118; J. O’Sullivan et al., “Measuring Joycean Influences on Flann O’Brien,”Digital Studies/Le Champ Numérique, vol. 8, no. 1, 2018, p. 6, https://doi.org/10.16995/ dscn.288; A. Tuzzi, M. A. Cortelazzo, “What is Elena Ferrante? A Comparative Analysis of a Secretive Bestselling Italian Writer,”Digital Scholarship in the Humanities, vol. 33, no. 3, 2018, pp. 685-702,https://doi.org/10.1093/llc/fqx066.

[28]P. Bourdieu, Distinction: A Social Critique of the Judgement of Taste(Reprint 1984 ed.), Harvard: Harvard University Press, 2000.

[29]T. Eagleton, Ideology: An Introduction, London: Verso, 1991; G. Lukács, The Historical Novel, Dublin: Merlin Press, 1965; R. Williams, Problems in Materialism and Culture: Selected Essays, London: Verso, 1980.

[30]C. Gallagher, S. Greenblatt, Practicing New Historicism, Chicago: University of Chicago Press, 2001.

[31]R. Darnton, “What is the History of Books?,”Daedalus, vol. 111, no. 3, pp. 65-83, 1982, http://nrs.harvard.edu/urn-3:HUL.InstRepos:3403038.

[32]S. E. Fish, Is There A Text in This Class?: The Authority of Interpretive Communities, Harvard: Harvard University Press, 1980.

[33]W. Van Peer, Stylistics and Psychology: Investigations of Foregrounding, London: Croom Helm, 1986,https://doi.org/10.1017/S0142716400000485.

[34]N. Groeben, Literaturpsychologie: Literaturwissenschaft Zwischen Hermeneutik und Empirie, Berlin: Kohlhammer, 1972.

[35]Fish, Is There A Text in This Class?, 1980.

[36]在确定文学研究的“首要方向”时,Köppe 和Winko(2013)提出,向主要范畴“文本”“语境”“作者”和“读者导向”中的一个加入类似的启发式分组,另外警告:“未考虑各方法间的关系和差异,另外,区分并不基于清晰或一致的分类标准。”

[37]J. B. Herrmann et al., “Revisiting Style, A Key Concept in Literary Studies,”Journal of Literary Theory, vol. 9, no. 1, 2015, pp. 25-52, https://doi.org/10.1515/jlt-2015- 0003.

[38]J. Burrows, “’Delta’: A Measure of Stylistic Difference and A Guide to Likely Authorship,”Literary and Linguistic Computing, vol. 17, no. 3, 2002, pp, 267-287, https://doi.org/10.1093/llc/17.3.267.

[39]M. Erlin, “Topic Modeling, Epistemology, and the English and German Novel,”Journal of Cultural Analytics, 2017, https://doi.org/10.22148/16.014; D. L. Hoover, “Corpus Stylistics, Stylometry, and the Styles of Henry James,”Style, vol. 41, no. 2, 2007, pp. 174-203, www.jstor.org/stable/10.5325/style.41.2.174; F. Jannidis et al.,“Makroanalytische Untersuchung von Heftromanen,” pp. 167-172; A. Piper, “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel,”New Literary History, vol. 46, no. 1, 2015, pp. 63-98, https://doi.org/10.1353/ nlh.2015.0008; T. Underwood, “The Life Cycles of Genres,” 2016; M. Wilkens, “Genre, Computation, ” 2016.

[40]M. L. Jockers, Macroanalysis: Digital Methods and Literary History, Chapter 6, Champaign: University of Illinois Press, 2013, http://doi.org/10.16995/dscn.62; F. Jannidis, G. Lauer, “Burrows’s Delta and its Use,”pp. 29-54.

[41]J. Cheng, “Feshing out Models of Gender in English-Language Novels (1850-2000) ,”Journal of Cultural Analytics, 2020, https://doi.org/10.22148/001c.11652; A. M. Jacobs, “Sentiment Analysis for Words,”2019; E. Kraicer, A. Piper, “Social Characters: The Hierarchy of Gender in Contemporary English-Language Fiction,”Journal of Cultural Analytics, 2019, https://doi.org/10.22148/16.032; A. Piper, Enumerations: Data and Literary Study, pp. 94-117; T. Underwood et al., “The Transformation of Gender in English-Language Fiction,”Journal of Cultural Analytics, 2018, https://doi.org/10.22148/16.019.

[42]C. Schö ch, “Corneille, Moliè re et les Autres. Stilometrische Analysen zu Autorschaft und Gattungszugehö rigkeit im Franzö sischen Theater der Klassik,” eds. C. Schöch, L. Schneider, Literaturwissenschaft im Digitalen Medienwandel, 2014, pp. 130-157, http://web.fu-berlin.de/phin/beiheft7/ b7t08.pdf.

[43] A. Tuzzi, M. A. Cortelazzo, “What is Elena Ferrante?,” pp. 685-702.

[44]S. Rebora, et al., “Robert Musil, A War Journal, and Stylometry: Tackling the Issue of Short Texts in Authorship Attribution,”Digital Scholarship in the Humanities, 2018, https://doi.org/10.1093/llc/fqy055.

[45]使用Craig’s and Eder’s Zeta;C. Schöch, “Zeta für die kontrastive Analyse literarischer Texte. Theorie, Implementierung, Fallstudie,” eds. T. Bernhart et al., Quantitative Ansatze in den Literatur-und Geisteswissenschaften, ̈ Systematische und historische Perspektiven, Berlin: De Gruyter, 2018, pp. 77-94, https://www.degruyter.com/view/books/ 9783110523300/9783110523300-004/9783110523300-004.xml。

[46]对歌德的作者身份研究的类似方法,见M. Kestemont et al., “A Computational Approach to Authorship Verification of Johann Wolfgang Goethe’s Contributions to the Frankfurter Gelehrte Anzeigen (1772-1773),”Journal of European Periodical Studies, vol. 4, no. 1, 2019, https://doi.org/10.21825/jeps. v4i1.10188。

[47]M. Mahlberg et al., “Phrases in Literary Contexts: Patterns and Distributions of Suspensions in Dickens’s Novels,”International Journal of Corpus Linguistics, vol. 18, no. 1, 2013, pp. 35-56, https://doi.org/10.1075/ijcl.18.1.05mah.

[48]M. P. Eve, Close Reading with Computers: Textual Scholarship, Computational Formalism, and David Mitchell’s Cloud Atlas, Palo Alto: Stanford University Press, 2019, http://www.sup.org/books/ title/?id=30253.

[49]M. Stubbs, “Conrad in the Computer: Examples of Quantitative Stylistic Methods,”Language and Literature, vol. 5, no. 1, 2005, pp. 5-24, https://doi.org/10.1177/0963947005048873.

[50]Herrmann et al., “The Kafka Case: Using Psychostyl-ometry for Digital Author-Oriented Criticism,” in

preparation.

[51]Y. R. Tausczik, J. W. Pennebaker, “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods,”Journal of Language and Social Psychology, vol. 29, no. 1, 2010, pp. 24-54, https://doi.org/10.1177/0261927X09351676.

[52]D. V. Hulle, “Editing the Wake’s Genesis: Digital Genetic Criticism,” ed. G. Fields, James Joyce and Genetic Criticism, Leiden: Brill, 2018, pp. 37-54, https://doi.org/10.1163/ 9789004364288_005.

[53]T. Ries, “The Rationale of the Born-digital Dossier Génétique: Digital Forensics and the Writing Process: With Examples from the Thomas Kling Archive,”Digital Scholarship in the Humanities, vol. 33, no. 2, 2018, pp. 391-424, https://doi.org/10.1093/llc/fqx049.

[54]G. Genette, Narrative Discourse: An Essay in Method, vol. 3, New York: Cornell University Press, 1983; K. Hamburger, The Logic of Literature, 2nd ed., trans. M. J. Rose, Bloomington: Indiana University Press, 1973.

[55]G. N. Leech, M. Short, Style in Fiction: A Linguistic Introduction to English Fictional Prose, 2nd ed., London: Pearson Longman, 2007; L. Spitzer, Stilstudien, München: Hueber, 1928.

[56]B. M. Eikhenbaum, “The Theory of the ‘Formal Method,’” eds. L. Matejka, K. Pomorska, Readings in Russian Poetics, Cambridge: MIT Press, 1971, pp. 3-37.

[57]S. Zetterberg Gjerlevsen, “Fictionality. In P. Hühn,” eds. J. C. Meister, J. Pier, and W. Schmid, The Living Handbook of Narratology, Hamburg: Hamburg University Press, 2016, https://www.lhn.uni-hamburg.de/node/138.html.

[58]T. Underwood, J. Sellers, “The Emergence of Literary Diction,”Journal of Digital Humanities, vol. 1, no. 2, 2012, pp. 17-26, http://journalofdigitalhumanities.org/1-2/the-emergence-of-literary-diction-by-tedunderwood-and-jordan-sellers/.

[59]另见:R. Heuser, L. Le-Khac, “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method,”Pamphlets of the Stanford Literary Lab, vol. 4, 2012, https://litlab.stanford.edu/LiteraryLabPamphlet4.pdf。

[60]A. Piper, Enumerations: Data and Literary Study, pp. 94-117.

[61]M. Caracciolo, “Creative Metaphor in Literature,” eds. E. Semino, Z. Demjé n, The Routledge Handbook of Metaphor and Language, London: Routledge, 2016, pp. 206-218, https://www.routledgehandbooks.com/doi/10.4324/9781315672953.ch14; E. Semino, G. J. Steen, “Metaphor in Literature,” ed. R. W. Gibbs, Cambridge Handbook of Metaphor and Thought, Cambridge: Cambridge University Press, 2008, pp. 232- 246, https://doi.org/ 10.1017/CBO9780511816802.015.

[62]G. Fauconnier, M. Turner, “Rethinking Metaphor,” ed. R. W. Gibbs, The Cambridge Handbook of Metaphor and Thought, Cambridge: Cambridge University Press, 2008, pp. 53-66, https://doi.org/10.1017/CBO9780511816802.005; G. Lakoff, M. Johnson, Metaphors We Live by, Chicago: University of Chicago Press, 1980.

[63]G. Lakoff, M. Turner, More than Cool Reason: A Field Guide to Poetic Metaphor, Chicago: University of Chicago Press, 1989.

[64]T. Veale et al., Metaphor: A Computational Perspective, San Rafael, California: Morgan & Claypool, 2016,https://doi.org/10.2200/S00694ED1V01Y201601HLT031.

[65]E. Shutova, “Annotation of Linguistic and Conceptual Metaphor,” eds. N. Ide, J. Pustejovsky, Handbook of Linguistic Annotation, London: Springer, 2017, pp. 1073-1100, https://doi.org/ 10.1007/978-94-024- 0881-2; G. J. Steen et al., “Metaphor in Usage,”Cognitive Linguistics, vol. 21, no. 4, 2010, pp. 765-796, https://doi.org/10.1515/cogl.2010.024.

[66]E. Semino, Metaphor in Discourse, Cambridge: Cambridge University Press, 2008; H. Tissari, “Corpus

Linguistic Approaches to Metaphor Analysis,” eds. E. Semino & Z. Demjén, The Routledge Handbook of Metaphor and Language, London: Routledge, 2016, pp. 117-130.

[67]另外,隐喻探测有越来越多的自动方法,即基于规则的(Y. Neuman et al., “Metaphor Identification in Large Texts Corpora,”PLoS ONE, vol. 8, no. 4, Article e62343, 2013, https://doi. org/10.1371/journal. pone.0062343)、无监测的或深度学习的方法(C. Tanasescu et al., “Metaphor Detection by Deep Learning and the Place of Poetic Metaphor in Digital Humanities,” AAAI Publications, The Thirty-First International Flairs Conference, 2018, https://aaai.org/ocs/index.php/FLAIRS/FLAIRS18/paper/ viewFile/17704/16866)。

[68]手工标注的《VU阿姆斯特丹隐喻语料库》(VUAMC, G. J. Steen et al., VU Amsterdam Metaphor Corpus, 2010, http://www.ota.ox.ac.uk/headers/2541.xml),是从英国国家语料库编纂的,其“虚构性”样本利用了20世纪后期英国品味一般的小说。

[69]A. G. Dorst, “More or Different Metaphors in Fiction? A Quantitative Cross-Register Comparison,”Language and Literature, vol. 24, no. 1, 2015, pp. 3-22, https://doi.org/10.1177/ 0963947014560486.

[70]J. B. Herrmann, “Anschaulichkeit Messen: Eine Quantitative Metaphernanalyse an Deutschsprachigen Erzählanfängen Zwischen 1880 und 1926,” eds. T. Köppe, R. Singer, Show, Don’t Tell: Konzepte und Strategien Anschaulichen Erzählens, Bielefeld: Aisthesis verlag, 2018, pp. 167-212.

[71]W. Anderson et al., Mapping English Metaphor Through Time, Oxford University Press, 2016. 关于“测绘隐喻”,详见:https://www.mappingmetaphor.arts.gla.ac.uk/。“测绘隐喻”项目系统性识别出词语将意义从一个领域扩展到另一领域的情况。但仅限英语,其有力的视觉化工具不对更多——可能是文学性的——资料开放。

[72]关于基于熵值的德语隐喻变化探测,见D. Schlechtweg et al., “German in Flux: Detecting Metaphoric Change via Word Entropy,” Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL), 2017, pp. 354-367, https://doi.org/ 10.18653/v1/K17-1036。

[73]见L. Nuttall, C. Harrison, “Wolfing Down the Twilight Series: Metaphors for Reading in Online Reviews,” eds. H.Ringrow, S. Pihlaja, Contemporary Media Stylistics, London: Bloomsbury, 2020, pp. 35-60,对英语的文体研究。

[74]J. B. Herrmann, T. Messerli, “Hungere Schon Nach dem Nächsten Band. Eine Untersuchung von Metaphern für Leseerfahrungen, in Web 2.0 Literaturrezensionen,” Jahres-Tagung “Digital Humanities im Deutschsprachigen Raum,” Book of Abstracts, 2020, https://doi. org/10.5281/zenodo.3666690.

[75]J. B. Herrmann, T. Messerli, “Metaphors We Read by: Finding Metaphorical Conceptualizations of Reading in Web 2.0 Book Reviews,” International Conference DH2020, https://dh2020.adho.org/wpcontent/uploads/2020/07/210_MetaphorswereadbyFindingmeta-phoricalconceptualizationsofreadinginw eb20bookreviews.html.

[76]G. Philip, “Conventional and Novel Metaphors in Language,” eds. E. Semino, Z. Demjé n, The Routledge Handbook of Metaphor and Language, London: Routledge, 2016, pp. 219-232.

[77]W. G. Reijnierse et al., “DMIP: A Method for Identifying Potentially Deliberate Metaphor in Language Use,”Corpus Pragmatics, vol. 2, 2018, pp. 129-147. https://doi.org/10.1007/s41701-017-0026-7.

[78]A. G. Dorst, “Personification in Discourse: Linguistic Forms, Conceptual Structures and Communicative Functions,”Language and Literature, vol. 20, no. 2, 2011, pp. 113-135, https://doi.org/10.1177/0963947010395522.

[79]Y. Bizzoni, “Detection and Aptness: A Study in Metaphor Detection and Aptness Assessment Through Neural Networks and Distributional Semantic Spaces,” Doctoral thesis, University of Gothenburg, GUPEA, 2019, http://hdl.handle.net/2077/58277.

[80]A. M Jacobs, A. Kinder, “What Makes a Metaphor Literary? Answers from Two Computational Studies, Metaphor and Symbol, vol. 33, no. 2, 2018, pp. 85-100, https://doi.org/10.1080/ 10926488.2018.1434943.

[81]I. Mani, “Computational Narratology,” eds. P. Hühn et al., The living handbook of narratology, Hamburg: Hamburg University Press, 2013, http://www.lhn.uni-hamburg. de/article/computational-narratology.

[82]J. Sachs, A. Piper, “Technique and the Time of Reading,”PMLA, vol. 133, no. 5, 2018, pp. 1259-1267,https://doi.org/10.1632/pmla.2018.133.5.1259; T. Underwood, “Why Literary Time is Measured in Minutes,”ELH, vol. 85, no. 2, 2018, pp. 341-365, https://doi.org/10.1353/elh.2018.0013.

[83]M. L. Jockers, Syuzhet: An R Package for the Extraction of Sentiment and Sentiment-based Plot Arcs from Text, 2017, https://www.rdocumentation.org/packages/syuzhet/versions/1.0.6; A. Piper, Enumerations: Data and Literary Study, pp. 42-65; B. M. Schmidt, “Plot Arceology: A Vector-space Model of Narrative Structure,” 2015 IEEE International Conference on Big Data, 2015, pp. 1667-1672, https://ieeexplore.ieee. org/document/7363937.

[84]M. Sims et al., “Literary Event Detection,”Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 3623-3634, https://doi. org/10.18653/v1/P19-1353.

[85]用于英语:D. Bamman et al., “An Annotated Dataset of Coreference in English Literature,” 2019,ArXiv:1912.01140 [Cs];用于德语:I. Rösiger et al., “Towards Coreference for Literary Text: Analyzing Domain-Specific Phenomena,”Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2018, pp. 129-138,https://www.aclweb.org/anthology/W18-4515。

[86]A. Piper, Enumerations: Data and Literary Study, pp. 118-146.

[87]D.Bamman et al., “A Bayesian Mixed Effects Model of Literary Character,” Proceedings of the 52nd  Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 370-379, 2014, https://doi.org/10.3115/v1/P14-1035.

[88]J. Cheng, “Feshing out Models of Gender,” 2020.

[89]A. M. Jacobs, “Sentiment Analysis for Words and Fiction Characters,” 2019. 对文学中情感分析的更广泛的调研,参见:E. Kim, R. Klinger, A Survey on Sentiment and Emotion Analysis for Computational Literary Studies,”Zeitschrift fur digitale Geisteswissenschaften ̈ , 2018, arXiv preprint arXiv:1808.03137。

[90]G. N. Leech, M. Short, Style in Fiction: A Linguistic Introduction to English Fictional Prose, 2nd ed., London: Pearson Longman, 2007.

[91]J. F Burrows, “Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style,”Literary and Linguistic Computing, vol. 2, no. 2, 1987, pp. 61-70, https://doi.org/10.1093/llc/2.2.61.

[92]E. Semino, M. Short, Corpus Stylistics: Speech, Writing and thought Presentation in a Corpus of English Writing, London: Routledge, 2004, https://doi.org/10.4324/9780203494073.

[93]B. Busse, Speech, Writing and thought Presentation in a Corpus of Nineteenth-Century English Narrative Fiction, Bern: University of Bern Press, 2010.

[94]A. Brunner, Automatische Erkennung von Redewiedergabe: Ein Beitrag zur Quantitativen Narratologie, Berlin: De Gruyter, 2015.

[95]H. Long, R. J. So, “Turbulent Flow: A Computational Model of World Literature,”Modern Language Quarterly, vol. 77, no. 3, 2016, pp. 345-367, https://doi.org/10.1215/00267929-3570656; J. Brooke et al., “Using Models of Lexical Style to Quantify Free Indirect Discourse in Modernist Fiction,”Digital Scholarship in the Humanities, vol. 32, no. 2, 2017, pp. 234-250, https://doi.org/10.1093/llc/fqv072.

[96]J. Egbert, M. Mahlberg, “Fiction-One Register or Two?,”Register Studies, vol. 2, no. 1, 2020, PP. 72-101, https://doi.org/10.1075/rs.19006.egb; G. Muzny et al., “Dialogism in the Novel: A Computational Model of the Dialogic Nature of Narration and Quotations,”Digital Scholarship in the Humanities, vol. 32, suppl_2, 2017, pp. ii31-ii52, https://doi.org/10.1093/llc/fqx031.

[97]K. Vishnubhotla et al., “Are Fictional Voices Distinguishable? Classifying Character Voices in Modern Drama, Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2019, pp. 29-34, Association for Computational Linguistics.

[98]G. Muzny et al., “Dialogism in the Novel,” pp. ii31-ii52.

[99]M. Coll Ardanuy, C. Sporleder, “Clustering of Novels Represented as Aocial Networks,”Linguistic Issues in Language Technology, 2015, https://www.aclweb.org/ anthology/2015.lilt-12.4.

[100]B. Volker, R. Smeets, “Imagined Social Structures: Mirrors or Alternatives? A Comparison between Networks of Characters in Contemporary Dutch Literature and Networks of the Population in the Netherlands,”Poetics, 2019, https://doi.org/10.1016/ j.poetic.2019.101379.

[101]D. K. Elson et al., “Extracting Social Networks from Literary Fiction,” Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, pp. 138-147, https://www.aclweb.org/ anthology/P10-1015.

[102]M. Algee-Hewitt, “Distributed Character: Quantitative Models of the English Stage, 1550-1900,”New Literary History, vol. 48, no. 4, 2017, pp. 751-782, https://doi.org/10.1353/nlh.2017.0038.

[103]C. Schöch, “Corneille, Molière et les Autres,” pp. 130-157.

[104]P. Trilcke, “Social Network Analysis (SNA) als Methode Einer Textempirischen Literaturwissenschaft,” eds. P.Ajouri, K. Mellmann, C. Rauen, Empirie in der Literaturwissen-schaft, Port Elizabeth: Mentis, 2013, pp. 201-247.

[105]F. Fischer, D. Skorinkin, “Social Network Analysis in Russian Literary Studies,” eds. D. Gritsenko et al., The Palgrave Handbook of Digital Russia Studies, 2021, pp. 517-536, London: Palgrave Macmillan, https://doi.org/10.1007/978-3-030-42855-6_30.

[95]H. Long, R. J. So, “Turbulent Flow: A Computational Model of World Literature,”Modern Language Quarterly, vol. 77, no. 3, 2016, pp. 345-367, https://doi.org/10.1215/00267929-3570656; J. Brooke et al., “Using Models of Lexical Style to Quantify Free Indirect Discourse in Modernist Fiction,”Digital Scholarship in the Humanities, vol. 32, no. 2, 2017, pp. 234-250, https://doi.org/10.1093/llc/fqv072.

[96]J. Egbert, M. Mahlberg, “Fiction-One Register or Two?,”Register Studies, vol. 2, no. 1, 2020, PP. 72-101, https://doi.org/10.1075/rs.19006.egb; G. Muzny et al., “Dialogism in the Novel: A Computational Model of the Dialogic Nature of Narration and Quotations,”Digital Scholarship in the Humanities, vol. 32, suppl_2, 2017, pp. ii31-ii52, https://doi.org/10.1093/llc/fqx031.

[97]K. Vishnubhotla et al., “Are Fictional Voices Distinguishable? Classifying Character Voices in Modern Drama, Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2019, pp. 29-34, Association for Computational Linguistics.

[98]G. Muzny et al., “Dialogism in the Novel,” pp. ii31-ii52.

[99]M. Coll Ardanuy, C. Sporleder, “Clustering of Novels Represented as Aocial Networks,”Linguistic Issues in Language Technology, 2015, https://www.aclweb.org/ anthology/2015.lilt-12.4.

[100]B. Volker, R. Smeets, “Imagined Social Structures: Mirrors or Alternatives? A Comparison between Networks of Characters in Contemporary Dutch Literature and Networks of the Population in the Netherlands,”Poetics, 2019, https://doi.org/10.1016/ j.poetic.2019.101379.

[101]D. K. Elson et al., “Extracting Social Networks from Literary Fiction,” Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, pp. 138-147, https://www.aclweb.org/ anthology/P10-1015.

[102]M. Algee-Hewitt, “Distributed Character: Quantitative Models of the English Stage, 1550-1900,”New Literary History, vol. 48, no. 4, 2017, pp. 751-782, https://doi.org/10.1353/nlh.2017.0038.

[103]C. Schöch, “Corneille, Molière et les Autres,” pp. 130-157.

[104]P. Trilcke, “Social Network Analysis (SNA) als Methode Einer Textempirischen Literaturwissenschaft,” eds. P.Ajouri, K. Mellmann, C. Rauen, Empirie in der Literaturwissen-schaft, Port Elizabeth: Mentis, 2013, pp. 201-247.

[105]F. Fischer, D. Skorinkin, “Social Network Analysis in Russian Literary Studies,” eds. D. Gritsenko et al., The Palgrave Handbook of Digital Russia Studies, 2021, pp. 517-536, London: Palgrave Macmillan, https://doi.org/10.1007/978-3-030-42855-6_30.

[106]T. Tangherlini, “Toward a Generative Model of Legend: Pizzas, Bridges, Vaccines, and Witches,”Humanities, vol. 7, no. 1, 2017, https://doi.org/10.3390/h7010001.

[107]M. Scheffel et al., “Time,” eds. P. Hühn et al., The Living Handbook of Narratology, Hamburg: Hamburg University, 2013, https://www.lhn. uni-hamburg.de/node/106.html.

[108]T. Underwood, “Why Literary Time is Measured in Minutes,” pp. 341-365.

[109]J. Sachs, A. Piper, “Technique and the Time of Reading,” pp. 1259-1267.

[110]J. C. Meister, “Tagging Time in Prolog: The Temporality Effect Project,”Literary and Linguistic Computing, vol. 20, Suppl, 2005, pp. 107-124, https://doi.org/10.1093/llc/fqi025.

[111]J. C.Meister, W. Schernus, Time: From Concept to Narrative Construct, Berlin: De Gruyter, 2011.

[112]R. Ikeo, “’Colloquialization’ in Fiction: A Corpus-driven Analysis of Present-tense Fiction,”Language and Literature:International Journal of Stylistics, vol. 28, no. 3, 2019, pp. 280-304, https://doi.org/10.1177/0963947019868894.

[113]D. Cooper et al., Literary Mapping in the Digital Age, London: Routledge, 2016; M. Gavin, E. Gidal,“Scotland’s Poetics of Space: An Experiment in Geospatial Semantics,”Journal of Cultural Analytics, 2017, https://doi.org/10.22148/16.017; R. T. Tally, The Routledge Handbook of Literature and Space, London: Routledge, 2017, https:// doi.org/10.4324/9781315745978.

[114]B. Piatti, n.d, A literary atlas of Europe, http://www.literaturatlas.eu/en/index.html.

[115]https://www.lancaster.ac.uk/mappingthelakes/; D. Cooper, I. Gregory, “Mapping the English Lake District: A literary GIS,”Transactions of the Institute of British Geographers, vol. 36, no. 1, 2011, pp. 89-108, https://doi.org/10.1111/j.1475- 5661.2010.00405.

[116]M.Wilkens, “The Geographic Imagination of Civil War-Era American Fiction,”American Literary History, vol. 25, no. 4, 2013, pp. 803-840, https://doi.org/10.1093/alh/ajt045.

[117]Stanford Literary Lab, “Mapping London’s Emotions,”New Left Review, vol. 101, 2016.

[118]A. M. Jacobs, “Neurocognitive Poetics: Methods and Models for Investigating the Neuronal and Cognitive-affective Bases of Literature Reception,”Frontiers in Human Neuro-Science, vol. 9, no. 186, 2015, https://doi.org/10.3389/fnhum.2015.00186.

[119]T. Haider, “Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features,” 2021, ArXiv:2102.08858 [Cs].

[120]M. Kraxenberger, W. Menninghaus, “Emotional Effects of Poetic Phonology, Word Positioning and Dominant Stress Peaks in Poetry Reading,”Scientific Study of Literature, vol. 6, no, 2, 2016, pp. 298- 313, https://doi.org/10.1075/ssol.6.2.06kra; M. Kraxenberger, W. Menninghaus, “Affinity for Poetry and Aesthetic Appreciation of Joyful and Sad Poems,”Frontiers in Psychology, vol. 7, no. 2051, 2017, https://doi.org/10.3389/ fpsyg.2016.02051.

[121]A.-S. Bories et al., Plotting Poetry: On Mechanically-Enhanced Reading, Liè ge: Presses Universitaires de Liè ge, in preparation.

[122]H. Katsma, “Loudness in the Novel,”Pamphlets of the Stanford Literary Lab, vol. 7, 2014, https://litlab. stanford.edu/LiteraryLabPamphlet7.pdf.

[123]T. Clement, S. McLaughlin, “Measured Applause: Toward a Cultural Analysis of Audio Collections,”Journal of Cultural Analytics, 2016, https://doi.org/10.22148/001c.11652.

[124]M. MacArthur et al., “Beyond Poet Voice: Sampling the (Non-) Performance Styles of 100 American Poets,”Journal of Cultural Analytics, 2018, https://doi.org/ 10.22148/16.022.

[125]N. M. Houston, A. Neal, “Reading the Visual Page of Victorian Poetry,”Digital Humanities, 2013, pp. 229-230.

[126]P. Fyfe, Q. Ge, “Image Analytics and the Nineteenth-Century Illustrated Newspaper,”Journal of Cultural Analytics, 2018, https://doi.org/10.22148/16.026.

[127]A. Piper et al., “The Page Image: Towards a Visual History of Digital Documents,”Book History, vol. 23,

2020, pp. 365-396.

[128]S. Abuelwafa et al., “Detecting Footnotes in 32 Million Pages of ECCO,”Journal of Cultural Analytics, 2018, https://doi.org/ 10.22148/16.029.

[129]Geoff Hall, “Literature as a Social Practice,” eds. S. Goodman, K. O’Halloran, The Art of English: Literary Creativity, London: Palgrave Macmillan, 2006, pp. 451-459.

[130]G. C. Spivak, “Can the Subaltern Speak?,” eds. C. Nelson, L. Grossberg, Marxism and the Interpretation of Culture, London: Palgrave Macmillan, 1988, pp. 271-313, https://doi.org/10.1007/978-1- 349-19059-1_20.

[131]S. Brown, L. Mandell, “The Identity Issue,”Journal of Cultural Analytics, 2018, https://doi. org/10.22148/16.020.

[132]T. Underwood et al., “The Transformation of Gender,” 2018.

[133]E. Kraicer, A. Piper, “Social Characters,” 2019.

[134]L. Tatlock et al., “Crossing over: Gendered Reading Formations at the Muncie Public Library, 1891- 1902,”Journal of Cultural Analytics, 2018, https://doi. org/10.22148/16.021.

[135]R. J. So et al., “Race, Writing, and Computation: Racial Difference and the US Novel, 1880-2000,”Journal of Cultural Analytics, 2019, https://doi.org/10.22148/16.031; R. J. So, E. Roland, “Race and Distant reading,”PMLA, vol. 135, no. 1, 2020, pp. 59-73, https://doi.org/ 10.1632/pmla.2020.135.1.59.

[136]M. Algee-Hewitt, “Representing Race and Ethnicity in American Fiction: 1789-1920,”Journal of Cultural Analytics, vol. 12, 2020, pp. 28-60, https://doi.org/10.22148/001c.18509.

[137]E. Evans, M. Wilkens, “Nation, Ethnicity, and the Geography of British Fiction, 1880-1940,”Journal of Cultural Analytics, 2018, https://doi.org/10.22148/16.024.

[138]J. Lee et al., “Linked Reading: Digital Historicism and Early Modern Discourses of Race Around Shakespeare’s Othello,”Journal of Cultural Analytics, 2018, https://doi.org/10.31235/osf.io/tg23u.

[139]C. W. Koolen, Reading Beyond the Female: The Relationship Between Perception of Author Gender and Literary Quality, PhD thesis, University of Amsterdam, 2018, UvA-DARE (Digital Academic Repository), http://hdl.handle.net/11245.1/cb936704-8215-4f47-9013- 0d43d37f1ce7; E. Losh, J. Wernimont, Bodies of Information: Intersectional Feminism and the Digital Humanities, Minnesota: University of Minnesota Press, 2019, https://doi.org/10.5749/j.ctv9hj9r9; L. Mandell, “Gender and Cultural Analytics: Finding or Making Stereotypes?,”Debates in the Digital Humanities, 2019, https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/5d9c1b63-7b60-42dd-8cda-bde837f638f4#ch01; S. U. Noble, “Toward a Critical Black Digital Humanities,”Debates in the Digital Humanities, 2019, https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/5aafe7fedb7e-4ec1-935f-09d8028a2687#ch02; R. Risam, “Beyond the Margins: Intersectionality and the Digital Humanities,”Digital Humanities Quarterly, vol. 9, no. 2, 2015; R. Risam, New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis, and Pedagogy, Evanston: Northwestern University Press, 2019, https://doi.org/10.1080/02759527.2019.1575080.

[140]P. Bourdieu, Distinction: A Social Critique of the Judgement of Taste(Reprint 1984 ed.), Harvard: Harvard University Press, 2000.

[141]K. van Dalen-Oskam, “Prehistory of the Riddle,” pp. 25-33; W. van Peer, ed. “The Quality of Literature,” 2008. 文本特质的测量在应用计算语言学中已发展数十年,用于话语处理和教育软件(D. S. McNamara et al., Automated Evaluation of Text and Discourse with Coh-Metrix, Cambridge: Cambridge University Press, 2014, https://doi. org/10.1017/CBO9780511894664; S. A. Crossley et al., “The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap,”Behavior Research Methods, 2019, vol. 51, no. 1, pp. 14-27, https://doi.org/10.3758/s13428-018-1142-4.)。近来用于文学文本的预测性建模工具有QNArt和SentiArt(Jacobs,“The Gutenberg English Poetry Corpus,” 2018; Jacobs, “(Neuro-)Cognitive Poetics,” 2018; Jacobs, “Sentiment Analysis for Words,” 2019; cf. https://github.com/matinho13/SentiArt),也适用于英语之外的其他语言。

[142]A. Piper, E. Portelance, “How Cultural Capital Works: Prizewinning Novels, Bestsellers, and the Time of Reading,”Post-45, 2016, http://post45.research.yale.edu/2016/05/how-cultural-capital-works-prizewinning-novels-bestsellers-and-the-time-of-reading/.

[143]另见:J. F. English, “Now, not Now: Counting Time in Contemporary Fiction Studies,”Modern Language Quarterly, vol. 77, no. 3, 2016, pp. 395-418, https://doi.org/10.1215/00267929-3570667。

[144]F. Jannidis et al., “Makroanalytische Untersuchung von Heftromanen,” pp. 167-172.

[145]T. Underwood, Distant Horizons: Digital Evidence and Literary Change, Chicago: University of Chicago Press, 2019.

[146]V. Ashok et al., “Success with Style: Using Writing Style to Predict the Success of Novels,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1753-1764, https://www.aclweb.org/anthology/D13-1181.pdf.

[147]van Cranenburgh et al., “Vector Space Explorations of Literary Language,”Language Resources and Evaluation, vol. 53, no. 4, 2019, pp. 625-650, https://doi. org/10.1007/s10579-018-09442-4.

[148]J. Kao, D. Jurafsky, “A Computational Analysis of Style, Affect, and Imagery in Contemporary Poetry,” Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature, 2012, pp. 8-17, https://citeseerx.ist.psu.edu/showciting?cid=14406028.

[149]A. M. Jacobs, “(Neuro-)Cognitive Poetics and Computational Stylistics,” pp. 165-208.

[150]R. Heydebrand, S.Winko, “The Qualities of Literatures: A concept of Literary Evaluation in Pluralistic Societies,” ed. W. van Peer, The Quality of Literature. Linguistic Studies in Literary Evaluation, Amsterdam:John Benjamins, 2008, pp. 223-239, https://doi.org/10.1075/ lal.4.16hey.

[151]S. Murray, The Digital Literary Sphere: Reading, Writing, and Selling Books in the Internet Era, Baltimore: Johns Hopkins University Press, 2018, https://doi.org/10.1007/s12109-019-09709-w.

[152]R. Darnton, “What is the History of Books?,” pp. 65-83.

[153]Fish, Is There A Text in This Class?, 1980.

[154]S. Rebora et al., “Digital Humanities and Digital Social Reading,”Digital Scholarship in the Humanities, 2021, https://doi 10.1093/llc/fqab020.

[155]S. Dimitrov et al., “Goodreads Versus Amazon: The Effect of Decoupling Book Reviewing and Book Selling,”International AAAI Conference on Web and Social Media, 2015, https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/ 10557/10452.

[156]G. Lauer, Lesen im digitalen Zeitalter, Hindenburgstraße: Wissenschaftliche Buchgesellschaft (wbg), 2020.

[157]K. Bourrier, M. Thelwall, “The Social Lives of Books: Reading Victorian Literature on Goodreads,”Journal of Cultural Analytics, 2020, https://doi.org/10.22148/001c.12049.

[158]van der Deijl et al., “The Canon of Dutch Literature According to Google,”Journal of Cultural Analytics, 2019, https://doi.org/10.22148/16.046.

[159]A. Riddell, K. van Dalen-Oskam, “Readers and Their Roles: Evidence from Readers of Contemporary Fiction in the Netherlands,”PloS ONE, vol. 13, no. 7, 2018, Article e0201157, https://doi.org/10.1371/journal.pone.0201157.

[160]A. Piper, J. Manalad, “Measuring Unreading,”Goethe Yearbook, vol. 27, 2020, pp. 233-241, https://doi. org/10.2307/j.ctvxhrm69.19.

[161]R. Cordell et al., “Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers,” American Literary History, vol. 27, no. 3, 2015, pp. E1-E15, https://doi.org/ 10.1093/alh/ajv029.

[162]K. Bode, A World of Fiction: Digital Collections and the Future of Literary History, Ann Arbor: University of Michigan Press, 2018, https://doi.org/10.3998/mpub.8784777.

[163]I. Schindler et al., “Measuring Aesthetic Emotions: A Review of the Literature and a New Assessment Tool,”PLOS ONE, vol. 12, no. 6, 2017, e0178899, https://doi.org/10.1371/journal.pone.0178899.

[164]W. Menninghaus et al., “What are Aesthetic Emotions?,”Psychological Review, vol. 126, no. 2, 2019, pp. 171-195, https://doi.org/10.1037/rev0000135.

[165]P. Hoffstaedter, “Poetic Text Processing and its Empirical Investigation,”Poetics, vol. 16, no. 1, 1987, pp. 75-91, https://doi.org/10.1016/0304-422X(87)90037-4.

[166]W. Iser, The Implied Reader: Patterns of Communication in Prose Fiction from Bunyan to Beckett, Baltimore: Johns Hopkins University Press, 1978.

[167]M. Bortolussi, P. Dixon, Psychonarratology: Foundations for the Empirical Study of Literary Response, Cambridge: Cambridge University Press, 2003, https://doi.org/10.1017/CBO9780511500107; A. M. Jacobs, “The Scientific Study of Literary Experience,” pp. 139-170.

[168]M. Burke, Literary Reading, Cognition and Emotion: An Exploration of the Oceanic Mind, London: Routledge, 2011, https://doi.org/10.4324/9780203840306.

[169]R. Schrott, A. M. Jacobs, Gehirn und Gedicht: Wie wir Unsere Wirklichkeiten Konstruieren, Liberty Twp: Hanser, 2011; A. M. Jacobs, “The Scientific Study of Literary Experience,” pp. 139-170.

[170]近期评论可见:M. Salgaro, “Historical Introduction to the Special Issue on Literariness,” pp. 5-17。

[171]P. Sopčá k et al., “Introduction to the Special Issue,”Scientific Study of Literature, vol. 8, no. 1, 2018, pp. 1-5, https://doi.org/10.1075/ssol.00004.int.

[172]P. Dixon, M. Bortolussi, “Measuring Literary Experience: Comment on Jacobs (2016),”Scientific Study of Literature, vol. 5, no. 2, 2015, pp. 178-182, Table 1, https://doi.org/10.1075/ssol.5.2.03dix; A. M. Jacobs, “Neurocognitive Poetics: Methods and Models,” 2015.

[173]NCPM; A. M. Jacobs, “The Scientific Study of Literary Experience,” pp. 139-170.

[174]Jacobs, “(Neuro-)Cognitive Poetics,” 2018.

[175]Jacobs, “The Gutenberg English Poetry Corpus,” 2018.

[176]见描述“LitBank”的出版物,这是100部英语小说注释数据集,用来支持自然语言处理和计算人文中的任务:https://github.com/dbamman/litbank;另见GutenTag(古登堡语料库项目中的数字人文研究的NLP驱动工具),http://www.cs.toronto.edu/~jbrooke/gutentag/。

[177]D. Bamman et al., “An Annotated Dataset of Literary Entities,” Proceedings of the 2019 Conference of the North, 2019, pp. 2138-2144, https://doi.org/10.18653/ v1/N19-1220; J. B. Herrmann et al., “Revisiting Style, A Key Concept in Literary Studies,” pp. 25-52; Jacobs, “(Neuro-)Cognitive Poetics,” 2018; M. L. Jockers, Syuzhet: An R Package for the Extraction of Sentiment and Sentiment-based Plot Arcs from Text, 2017, https://www.rdocumentation.org/packages/syuzhet/versions/1.0.6.

[178]包括语言与文化,见:J. B. Herrmann et al., “Towards Modeling the European Novel. Introducing ELTeC for Multilingual and Pluricultural Distant Reading,” International Conference DH2020, Ottawa, Canada, 2020, http://dx.doi.org/10.17613/p854-af61。

[179]N. Bubenhofer, P. Dreesen, “Linguistik Als Antifragile Disziplin? Optionen in Der Digitalen Transformation,” Digital Classics Online, vol. 4, no. 1, 2018, pp. 63-75, https://doi.org/10.11588/dco.2017.0.48493.

[180]A. Acker, T. Clement, “Data Cultures, Culture as Data-Special Issue of Cultural Analytics,”Journal of Cultural Analytics, 2019, https://doi.org/10.31235/osf.io/975g2; K. Bode, “Why You Can’t Model Away Bias,”Modern Language Quarterly, vol. 81, no. 1, 2020, pp. 95-124, https://doi.org/10.1215/00267929- 7933102; J. Flanders, F. Jannidis, “Data Modeling in S. Schreibman,” eds. R. Siemens, J. Unsworth, A New Companion to Digital Humanities, New York: Wiley-Blackwell, 2016; M. Gavin, “Vector Semantics, William Empson, and the Study of Ambiguity,”Critical Inquiry, vol. 44, no. 4, 2018, pp. 641-673, https://doi.org/10.1086/698174; J. Kuhn,“Computational Text Analysis within the Humanities: How to Combine Working Practices from the Contributing Fields?,”Lang Resources & Evaluation, vol. 53, 2019, pp. 565-602, https://doi.org/10.1007/s10579-019-09459-3; J. C. Meister, “From TACT to CATMA or A Mindful Approach to Text Annotation and Analysis,” eds. Nyhan, G. Rockwell, and S. Sinclair, On Making in the Digital HumanIties: Essays on the Scholarship of Digital Humanities Development in Honour of John Bradley, 2020, https://jcmeister.de/downloads/texts/Meister_2020-TACT-to-CATMA.pdf; A. Piper, “Think Small: On Literary Modeling,”PMLA, vol. 132, no. 3, 2017, pp. 651-658, https://doi.org/10.1632/pmla.2017.132.3.651; R. J. So, “All Models are Wrong,”PMLA, vol. 132, no. 3, 2017, pp. 668-673; https://doi.org/10.1632/ pmla.2017.132.3.668.

[181]M. Gavin, “Is There a Text in My Data? (Part 1): On Counting Words,”Journal of Cultural Analytics, 2020, https://doi.org/10.22148/001c.11830; M. Gavin et al., “Spaces of Meaning: Conceptual History, Vector Semantics,and Close Reading,” Debates in the Digital Humanities 2019, https://dhdebates. gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/4ce82b33-120f-423fba4c-40620913b305#ch21; J. Herrmann et al., “Anatomy of Tools: A Closer Look at ‘Textual DH’ Methodologies,”DHQ: Digital Humanities Quarterly, Manuscript submitted for publication.

[182]现存数据库往往反应了从特定文学研究学派继承来的理论框架(或来自体现于类似Gutenberg.org等合集中非学术原则)。

[183]L. Burnard et al., “In Search of Comity: TEI for Distant Reading,”Zenodo, 2019, https://doi.org/10.5281/ZENODO.3552489.

[184]如瑞士的KNORA(“Knowledge Organization, Representation and Annotation”)和SALSAH(“System for Annotation and Linkage of Sources in Arts and Humanities”),https://www.knora.org/。

[185]K. Bode, A World of Fiction, 2018; K. Bode, “Why You Can’t Model Away Bias,” 2020; A. Piper, “Do We Know what We are Doing?,”Journal of Cultural Analytics, 2019, https://doi.org/ 10.22148/001c.11826; T. Underwood, “Algorithmic Modeling: Or, Modeling Data We do not yet Understand,” eds. J. Flanders, F. Jannidis, The Shape of Data in Digital Humanities: Modeling Texts and Text-based Resources, London: Routledge, 2018, https://doi.org/10.4324/9781315552941.

[186]K. Bode, A World of Fiction, 2018; K. Bode, “Why You Can’t Model Away Bias,” 2020.

[187]J. B. Herrmann, G. Lauer, “Korpusliteraturwissenschaft. Zur Konzeption und Praxis am Beispiel Eines Korpus zur Literarischen Moderne,”Osnabrucker Beitra ̈ ge Zur Sprachtheorie (OBST) ̈ , vol. 92, 2018, pp. 127-156.

[188]J. B Herrmann et al., “Ein großer Berg Daten? Zur bibliothekswissenschaftlichen Dimension des korpus literaturwissenschaftlichen Digital Humanities-Projekts, High Mountains-Deutschschweizer Erzä hlliteratur 1880-1930,Zeitschrift Für Bibliothekskultur/Journal for Library Culture, vol. 8, no. 1, 2021, https://doi.org/10.21428/1bfadeb6.6e2feff6.

[189]T. Underwood et al., “NovelTM Datasets for English-Language Fiction, 1700-2009,”Journal of Cultural Analytics, 2020, https://doi.org/10.22148/001c.13147.

[190]M. Salganik, Bit by Bit: Social Research in the Digital Age, Princeton: Princeton University Press, 2019.

[191]T. Underwood et al., “Mapping Genre at the Page Level in English-Language Volumes from HathiTrust, 1700-1899,” Poster at DH2014, http://hdl. handle.net/2142/50291.

[192]https://www.dnb.de/DE/Professionell/Metadatendienste/metadatendienste_node. html#sprg186916.

[193]F. Jannidis et al., “Makroanalytische Untersuchung von Heftromanen,” DHd 2019, Digital Humanities: Multimedial & Multimodal. Konferenzabstracts, 2019, pp. 167-172, https://10.5281/zenodo.2596095.

[194]World Wide Web Consortium (n. d.), Linked data. Retrieved from W3C website: https://www. w3.org/standards/semanticweb/data.

[195]D.Bamman et al., “A Bayesian Mixed Effects Model,” pp. 370-379.

[196]http://cwb.sourceforge.net/.

[197]http://textometrie.ens-lyon.fr/.

[198]J. B. Herrmann, “Operationalisierung der Metapher zur Quantifizierenden Untersu-Chung Deutschsprachiger Literarischer Texte im Ü bergang vom Realismus zur Moderne,” ed. F. Jannidis, Tagungsband DFG-symposium “Digitale Literaturwissenschaft,” Berlin: De Gruyter, in press.

[199]E. Gius et al., “Special Issue: A Shared Task for the Digital Humanities: Annotating Narrative Levels,” Journal of Cultural Analytics, 2019, https://doi.org/ 10.22148/16.047.

[200]J. Kuhn, “Computational Text Analysis within the Humanities,” pp. 565-602.

[201]M. C. Traub, J. van Ossenbruggen, Workshop on Tool Criticism in the Digital Humanities, Report, Miami: CWI Techreport, 2015, https://ir.cwi.nl/pub/23500.

[202]M. Salganik, Bit by Bit: Social Research in the Digital Age, 2019.

[203]Traub, Ossenbruggen, Workshop on Tool Criticism, 2015.

[204]J. P. Simmons et al., “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,”Psychological Science, vol. 22, no. 11, 2011, pp. 1359-1366, https://doi.org/10.1177/0956797611417632.

[205]J. Herrmann et al., “Anatomy of Tools: A Closer Look at ‘Textual DH’ Methodologies,”DHQ: Digital Humanities Quarterly, Manuscript submitted for publication.

[206]N. Bubenhofer et al., “The Linguistic Construction of World: An Example of Visual Analysis and Methodological Challenges,” ed. R. Scholz, Quantifying Approaches to Discourse for Social Scientists, London: Palgrave Macmillan, 2019, pp. 251-284, https://doi.org/10.1007/978-3-319-97370-8_9.

[207]这包括开放科学原则,例如假说的预登记,特定研究项目所有数据、代码的开放传播,Journal of Cultural Analytics就是这种做法的一个例子。

[208]除了参考文献,我们的研究合作还维护了Zotero数据库,用于计量文体学:https://www.zotero.org/groups/2358990/research_coalition_computational_stylistics。