Programming for Corpus Linguistics with Python and Dataframes
暫譯: 使用 Python 和 Dataframes 進行語料庫語言學編程

Keller, Daniel

  • 出版商: Cambridge
  • 出版日期: 2024-06-20
  • 售價: $2,850
  • 貴賓價: 9.5$2,708
  • 語言: 英文
  • 頁數: 75
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 1009486780
  • ISBN-13: 9781009486781
  • 相關分類: Python程式語言
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

This Element offers intermediate or experienced programmers algorithms for Corpus Linguistic (CL) programming in the Python language using dataframes that provide a fast, efficient, intuitive set of methods for working with large, complex datasets such as corpora. This Element demonstrates principles of dataframe programming applied to CL analyses, as well as complete algorithms for creating concordances; producing lists of collocates, keywords, and lexical bundles; and performing key feature analysis. An additional algorithm for creating dataframe corpora is presented including methods for tokenizing, part-of-speech tagging, and lemmatizing using spaCy. This Element provides a set of core skills that can be applied to a range of CL research questions, as well as to original analyses not possible with existing corpus software.

商品描述(中文翻譯)

本元素為中級或有經驗的程式設計師提供在 Python 語言中使用資料框(dataframes)進行語料語言學(Corpus Linguistic, CL)編程的演算法,這些資料框提供了一組快速、高效且直觀的方法,用於處理大型、複雜的數據集,例如語料庫。此元素展示了應用於 CL 分析的資料框編程原則,以及創建對照表、生成搭配詞、關鍵字和詞彙束的完整演算法,並執行關鍵特徵分析。還提供了一個創建資料框語料庫的附加演算法,包括使用 spaCy 進行斷詞、詞性標註和詞形還原的方法。本元素提供了一組核心技能,可應用於各種 CL 研究問題,以及進行現有語料庫軟體無法實現的原創分析。