Detecting relevant segment of text in legal domain

检测法律领域中的相关文本片段

基本信息

项目摘要

The goal of the research is to investigate, design, and implement algorithms to detect (or recognize) and extract the relevant segment of text, predict and recognize legal entities and context, and finally generate an appropriate metadata to be stored in a structured database. The database can be utilized in several scenarios from the user query to legal research by law practitioners. The notion of "relevant segment" is defined as a contiguous piece of a text which is relevant to the question of interest (or simply query). Relevance can be measured by different methods depending how relevance is being interpreted. If we are looking for the name of a judge in a legal document, we can use a wide range of information extraction (IE) tools. IE takes advantage of a broad spectrum of techniques from image segmentation, when the image of the document is available and a relevant segment is highly expected in a specific zone, to Conditional Random Fields (CRF) and Markov Models to Machine Learning and classification. While the structured pieces of information such as entities can be extracted using IE techniques, for deeper, ambiguous, and conceptual components of a legal document such as the type of damage or the judge's decision and case outcome, we need to develop a supervised machine learning algorithms beyond IE techniques. This problem is neither a traditional IE problem nor a text classification. To solve this problem, a legal document is partitioned into conceptually-related segments such as header, case, citations, damages, decision, and so on. This step is called zoning and can be performed using supervised or unsupervised learning methods. Some zones such as headers are expected to appear in the very first section of the document and so they can be detected by unsupervised techniques. On the other hand, there are other components such as "damages" which may appear in any part of the documents and needs a supervised model using either lexicon-based or manually-labeled grand truth or both.
研究的目标是调查,设计和实现算法来检测(或识别)和提取相关的文本段,预测和识别法律的实体和上下文,并最终生成一个适当的元数据存储在一个结构化的数据库。该数据库可以用于从用户查询到法律从业者的法律的研究的几种情况。“相关段”的概念被定义为与感兴趣的问题(或简单的查询)相关的文本的连续部分。相关性可以通过不同的方法来衡量,这取决于如何解释相关性。如果我们要在法律的文件中查找法官的姓名,我们可以使用各种信息提取(IE)工具。IE利用了广泛的技术,从图像分割,当文档的图像可用并且在特定区域中高度期望相关片段时,到条件随机场(CRF)和马尔可夫模型,再到机器学习和分类。虽然可以使用IE技术提取实体等结构化信息,但对于法律的文件中更深层次、模糊和概念性的组成部分,例如损害类型或法官的判决和案件结果,我们需要开发一种监督机器学习算法IE技术之外的算法。这个问题既不是传统的IE问题,也不是文本分类问题。为了解决这个问题,一个法律的文档被划分成概念上相关的片段,如标题,案例,引用,损害赔偿,决定,等等。这个步骤被称为分区,可以使用监督或无监督学习方法来执行。某些区域(如标题)预计会出现在文档的第一部分,因此可以通过无监督技术检测到它们。另一方面,还有其他组件,如“损坏”,可能出现在文档的任何部分,需要使用基于词典或手动标记的大真值或两者的监督模型。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Makrehchi, Masoud其他文献

Improving clustering performance using independent component analysis and unsupervised feature learning
Content Tree Word Embedding for document representation
  • DOI:
    10.1016/j.eswa.2017.08.021
  • 发表时间:
    2017-12-30
  • 期刊:
  • 影响因子:
    8.5
  • 作者:
    Kamkarhaghighi, Mehran;Makrehchi, Masoud
  • 通讯作者:
    Makrehchi, Masoud

Makrehchi, Masoud的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Makrehchi, Masoud', 18)}}的其他基金

Algorithms and applications of Link Mining: Making Sense of Network Data
链接挖掘的算法和应用:理解网络数据
  • 批准号:
    RGPIN-2021-03380
  • 财政年份:
    2022
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Algorithms and applications of Link Mining: Making Sense of Network Data
链接挖掘的算法和应用:理解网络数据
  • 批准号:
    RGPIN-2021-03380
  • 财政年份:
    2021
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2019
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2018
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Identifying General Product and Brand Names in Online Forums
识别在线论坛中的通用产品和品牌名称
  • 批准号:
    521298-2017
  • 财政年份:
    2017
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Engage Grants Program
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2017
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2016
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2015
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Computer assisted generation and transformation of web content
计算机辅助网页内容的生成和转换
  • 批准号:
    477757-2015
  • 财政年份:
    2015
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Engage Grants Program
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2014
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual

相似海外基金

Chromosomal aberration detection in FFPE tissue using proximity ligation sequencing
使用邻近连接测序检测 FFPE 组织中的染色体畸变
  • 批准号:
    10759887
  • 财政年份:
    2023
  • 资助金额:
    $ 1.82万
  • 项目类别:
Inherited and de novo genetic variants relevant to familial, recurrent and sporadic stillbirth
与家族性、复发性和散发性死产相关的遗传性和从头遗传变异
  • 批准号:
    10719376
  • 财政年份:
    2023
  • 资助金额:
    $ 1.82万
  • 项目类别:
Development of recombinant VSV vaccines for emerging bunyaviruses
针对新兴布尼亚病毒的重组 VSV 疫苗的开发
  • 批准号:
    10603853
  • 财政年份:
    2023
  • 资助金额:
    $ 1.82万
  • 项目类别:
Interspecies reservoirs of antibiotic resistance for Neisseria gonorrhoeae
淋病奈瑟菌抗生素耐药性的种间储存库
  • 批准号:
    10705474
  • 财政年份:
    2023
  • 资助金额:
    $ 1.82万
  • 项目类别:
Multi-omics study of ancestry enriched associations in Hispanics/Latinos
西班牙裔/拉丁裔血统丰富关联的多组学研究
  • 批准号:
    10889299
  • 财政年份:
    2023
  • 资助金额:
    $ 1.82万
  • 项目类别:
Structural Variation and Hematological Traits
结构变异和血液学特征
  • 批准号:
    10657020
  • 财政年份:
    2023
  • 资助金额:
    $ 1.82万
  • 项目类别:
Genomic structural dynamics in fibroblasts during heart failure
心力衰竭期间成纤维细胞的基因组结构动态
  • 批准号:
    10733415
  • 财政年份:
    2022
  • 资助金额:
    $ 1.82万
  • 项目类别:
NeoChip for specific and rapid identification of congenital CMV and neonatal HSV infections on minimal sample volume
NeoChip 用于以最少的样本量特异性快速识别先天性 CMV 和新生儿 HSV 感染
  • 批准号:
    10701864
  • 财政年份:
    2022
  • 资助金额:
    $ 1.82万
  • 项目类别:
Exploring novel nucleic acid therapeutic delivery methods and therapeutic strategies
探索新型核酸治疗递送方法和治疗策略
  • 批准号:
    10514270
  • 财政年份:
    2022
  • 资助金额:
    $ 1.82万
  • 项目类别:
Multiomic genomic mapping with long read sequencing
使用长读长测序进行多组基因组作图
  • 批准号:
    10546355
  • 财政年份:
    2022
  • 资助金额:
    $ 1.82万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了