qt configuration system new project function

2022-12-22   ES  

Source | CSDN blog

Author | matt_sh, blame | Carol

produced | CSDN cloud computing (ID: CSDNCloud)

This article is a notes for personal reading articles. It does not involve the application of deep learning in the extraction of relationships.

Part of the

note comes from personal interpretation, part of the original text, part from the online excerpt. Some notes are not perfect, and the focus of subsequent supplementary supplements should be the cutting -edge papers in recent years.

core method

Read resources: nuclear method in SVM

https://zhuanlan.zhihu.com/p/27445103

1、Dependency Tree Kernels for Relation Extraction

thinking:Convert the sentence into a syntax dependent tree, and build a enhanced dependency tree (Augmented Dependency Trees), obtain a sentence of one sentence and two entities, define the corresponding nuclear function, calculate the similarity between different trees, and finally use SVM sort. The disadvantage of this method is that it depends on the result of enhanced dependency trees.

Reasons

Theis that the dependency of the dependence of the tree contains the dependence of different components in the sentence. The author believes that the example of similar relationship will also have a similar structure on its corresponding dependencies. The purpose of the nuclear function is to find the similarity between the dependence of the tree. Therefore, after finding it again, you only need to incorporate the core function into the SVM.

experimental part:

Use ACE dataset [only 5 of them, no 24 of them]

Use different kernels in SVM,

K 0 = sparse kernel

K 1 = contiguous kernel

K 2 = bag-of-words kernel

K 3 = K 0 + K 2

K 4 = K 1 + K2

first use the second classification SVM to detect: whether there is a relationship between the entity, and then use libsvm to classify the relationship.

Reasons for dual -classification test:

Detecting relations is a difficult task for a kernel method because the set of all non-relation instances is extremely heterogeneous, and is therefore difficult to characterize with a similarity metric.

2、A Shortest Path Dependency Kernel for Relation Extraction

On the basis of Dependency Tree, there are a lot of unnecessary information in the task sentence. Some people propose the minimum tree method. Here the author finds the shortest path to solve it.

specific practice:Construct a sentence into a graph, where the words are used as the node of the figure, and the dependence relationship is the edge of the figure. In this way, we can get the shortest path of the two entities, combine the characteristics of the words, words, and physical categories of the nodes on the shortest path to obtain the final feature, and finally use the nuclear method and SVM to classify the relationship.

Evaluation:Innovation point lies in the shortest path of dependency and deposit relationship, which is similar to our human reasoning relationship. The disadvantage is that it still depends on the quality of the NLP tools used, which will affect the accuracy of the model.

3、Exploring Various Knowledge in Relation Extraction

This article studies the fusion of vocabulary, sentences, and semantic knowledge based on SVM -based relationships. Studies have shown that the Chunking method is very effective for the relationship extraction, and it helps to improve most of the performance in terms of clauses, and the additional information from complete syntax analysis is limited to the model performance enhancement.

Therefore, the author believes (confirmed in the experiment) that most of the useful information in the complete analysis tree used for relationship extraction is shallow, and can be captured by block.

experimental part:Use ACE dataset to model 6 of them (24 sub-categories), because considering M1-M2, M2-M1 belongs to two categories, (except for 6 symmetrical relationships [“RelativeLocation”, “Association” , “Other-Relative”, “OtherProfESIONAL”, “Sibling”, and “Spouse”.]), And there is no category, so a total of 43 categories establish a multi-class model.

Key conclusion:

  • Dependency Tree and PARSE TREE have limited improvement on the model because the ACE expects that the relationship between China is shorter, and the interval between more than 70%or more entities is only one word. The characteristics of dependency trees and analysis trees can only play a role in the remaining long -distance relationships. However, although the Collins parser used in our system represents the latest technologies for complete analysis, full analysis is always prone to long -distance errors.

  • certain relationship detection and classification are more difficult, such as the relationship between AT and its subclasses.

  • After adding the results of Chunking, the feature -based method is significantly better than the nuclear method. This indicates that feature -based methods can effectively combine different features from different sources (such as WordNet and Gazetters), thereby impact on relations.

    In the analysis of the error distribution, the results show that the error of 73%(627/864) originated from the relationship test, and the error of 27%(237/864) comes from the relationship of the relationship, of which 17.8%(154/864) error The error of 9.6%(83/864) is derived from the error classification between the relationship type. This shows that the relationship detection is the key to the extraction of the relationship.

Reading: chunking (block analysis)

https://blog.csdn.net/Sirow/article/details/89306934

Remote supervision

Reading:

  • remote supervision relationship drawing the paper summaryhttps://zhuanlan.zhihu.com/p/39885744

  • Multiple example Multi -label learninghttp://palm.seu.edu.cn/zhangml/files/cccf09-mil&mll.pdf

  • mimlhttps: //blog.csdn.net/weixin_41108334/details/83048552

1、Distant supervision for relation extraction without labeled data

Core thought:If two entities in one sentence have a certain relationship, then the two entities in other sentences are also likely to express this relationship.

In the article, the author found that the syntax of the continuous block of the block has a good performance, which helps the information extraction of remote supervision. The author uses the method of connecting features (the vocabulary sentence method is connected, and it is not used independently, [thanks to large samples]).

So, you can use existing relationships in the database, find a large number of entity pairs, and find the corresponding sentence marking the corresponding relationship. Then extract the vocabulary, sentences, semantic characteristics of these sentences for training, and get the model of the extraction. The negative sample uses a random entity to mark it. Use this strategy to generate training samples, reduce labeling, and then design characteristics to train relationship classifiers.

Advantages:can use large data sets, not overfitting, and compared to unsupervised learning, the relationship is determined.

Question:The first is that assuming is too sure, sometimes the two entities appear together, but there is no relationship defined by the knowledge base. There may also be a variety of relationships between the two entities, so it is impossible to judge which relationship in this sentence; in addition, this labeling method depends on the performance of NER. 【NLP Tool】

Future work:The simpler, whether Chunker’s syntax characteristics can obtain sufficient information without increasing the overhead overhead to improve performance.

2、Multi-instance Multi-label Learning for Relation Extraction

This article is mainly to solve the first problem mentioned in the remote supervision papers. There is not only one relationship between entity, such as China-Beijing. It may be Beijing in China, or Beijing is the capital of China, or it may be smaller in Beijing than China. This is different sentences that can extract the same entity and express different relationships. Therefore, the author proposes to use multiple examples and multiple labels to solve this problem.

This is a simple picture of the multi -demonstration and multi -label learning given in the article:

The article uses a graph model with hidden variables to model all instances and all its tags in the text, and then use the EM algorithm to solve the model.

About EM algorithm, look at this:EM algorithm interpretation

https://www.zhihu.com/question/40797593/answer/275171156

Open relationship extraction

1、Relation Extraction with Matrix Factorization and Universal Schemas

Summary of schemas

https://www.zhihu.com/question/59624229/answer/167115969

Thinking:This article proposes the method of General SCHEMA. Choose the relationship obtained by the open relationship drawing method and the existing relationship in the existing database to form a two -dimensional matrix. ** Line ** is an entity (derived from the existing database and the textant of the text), and the ** column ** corresponds to the connection between the relationship between the fixed SCHEMA relationship and the open domain relationship. ** column ** understands as ** items **, similar to the method of collaborative filtering to solve this problem.

model form:

This is a screenshot in the paper. It can be seen that the source of the column is the relationship obtained by Openie, and part of it comes from the existing KG, such as FreeBase.

Core formula:

Summary, define different parts of the parameters, various parameters, and weight matrices.

but the problem is that there are only positive samples and no negative samples. That is, the model learning tends to predict different situations as true.

> **Bayesian Personalized Ranking (BPR)**:uses a variant of this ranking:giving observed true facts higher scores than unobserved (true or false) facts (Rendle et al., 2009).

The initial solution was to construct negative samples by themselves with remote supervision, but the effect was not good (the rachiness of different negative samples was low, and the learning cost became higher), so the BPR method was used.

The problem to solve:

Does it help to improve more isolated?)

First of all, the data processing section is connected to the New York Times article expected to be connected with the quotation of FreeBase, and then filter and screen.

> Based on this alignment we file out all related. )

Then build a matrix. For each meal group T, the corresponding relationship example $ o_T $ consists of two parts. $ O_t = O_T^{fb} \ cup o_t^{Pat} $.

In this way, a matrix is established from a dataset.

Evaluation part, build a PRC curve. The method of calculating the precision here: For each relationship, take the first 1,000 entity pairs. Concentrate the first 100, manually judge its correlation or authenticity. The result of the result of the recall and accuracy of the result.

So, open sexuality extraction is only a tool for obtaining data sets. The focus of this article is this matrix and corresponding parameter estimation method.

technical war “epidemic”, Jia Yangqing and Li Feifei will talk about AI technology live broadcast to programmers!

On February 18 and February 20, Alibaba Cloud CIO Academy officially opened the “epidemic” technology course.You will get Li Feifei, the chief scientist of Dharma Database, Li Feifei, Vice President of Alibaba Group, ACM Outstanding Scientist, the father of Caffe, the founder of ONX, and the vice president of Alibaba Group, the vice president of Alibaba Group, and the dean of Ali’s CIO College Hu Chenjie and other top technical experts have the opportunity to interact live in live broadcast.

Recommended reading: How to solve the problem that the Hadoop cluster cannot be closed normally! | Blog posts selection 
 Write to big data practitioners: 5 traps and defects of data science 
 How to use ASP.NET CORE to achieve fusion and downgrade? 
 Evolution of Bitcoin technology stack 
 Python crawls 1794 data of Li Ziyi Chili Sauce, some people think that beef grains are too small ...... | For the Force Plan 
 The programming language performance is better than Python? 
 Really fragrant, I was watching! Poke "Read the original text", join the first session of technical public welfare training, even Li Feifei, an outstanding scientist of Mai ACM

source

Related Posts

ESP8266 WIFI module learning road (5) -Android mobile phone debugging assistant communicates with single -chip microcomputers

Quartz Deploy Table ‘HeartBeat.qrtzLocks’ DOESN’T Exist

zabbix email alarm

(1) The role of JavaScript

qt configuration system new project function

Random Posts

1 1 ——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————— FindFirstfile traverses the folder Benjamin

VS2015 GDAL (C ++ version) configuration

CentOS7+Install Docker, and deploy it as a Docker mirror of the Note Service side

# Cross-Entropy (Cross Entropy) loss function chenglin

Axios+QS Send AJAX request to get interface data