word2vec 3

[논문 같이 읽기] Distributed Representations of Words and Phrases and their Compositionality

논문 링크 : Distributed Representations of Words and Phrases and their Compositionality The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extens arxiv.org 자료 링크 : - 모델의 구조 및 동작에 대한 설명이 디테일하게 잘 되어 있음 - w..

Data/논문 읽기 2022.10.05

[논문 같이 읽기] Efficient Estimation of Word Representations in Vector Space

논문 링크 : Efficient Estimation of Word Representations in Vector Space We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best per arxiv.org 참고 자료 링크 : 02) 워드투벡터(Word2Vec) 앞서 원-핫 벡터는 단어 벡터 간 유의미한 유사도를 계산..

Data/논문 읽기 2022.09.29

Data Augmentation (CSV&TXT) using Back Translation

Data Augmentation (CSV&TXT) using Back Translation Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources www.kaggle.com 캐글 공모전에 참가하면서 분석한 상위 링크의 코드를 리뷰하고자 한다. NLP 문제에 대해 Data Augmentation을 적용해 Train 볼륨을 늘리는 코드이다. Data Augmentation을 위해 NLPaug 라는 라이브러리를 설치하고 관련 패키지를 불러온다. nlpaug — nlpaug 1.1.10 documentation © Copyright 2019, Edward Ma Revision bb2fc63..

Data/코드 리뷰 2022.03.15