leverage: 使用,利用,这个非常常见。例句:By leveraging the Vehicle-to-Vehicle (V2V) communication technology, different Connected Automated Vehicles (CAVs) can share their sensing information and thus provide multiple viewpoints for the same obstacle to compensate each other [1].
heterogeneous: 异向的,不同的,在graph里经常用到,例句:V2X-Vit is composed of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention, capturing inter-agent interaction and per-agent spatial relationships [2].
adverse: 不利的,例句:Our experiments show that multi-camera configurations are critical in overcoming adverse conditions in large-scale outdoor scenes [3].
streamline: 精简,例子:We streamline the training pipeline by viewing object detection as a direct set prediction problem [4].
reason, 常用在关系推导,比如:DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel [4].
bypass, 跳过
invariant, 不变的, 例子:a point cloud is just a set of points and therefore invariant to permutations of its members(PointNet). Since the transformer architecture is permutation-invariant [4].
omit,忽略
auxiliary loss: 辅助损失
philosophy: 理念 This end-to-end philosophy has led to significant advances for ... [4].
fidelity: 逼真度,常用于simulation里, 例句:However, this can be a complex and costly endeavor that requires constructing realistic virtual worlds and developing high-fidelity sensor simulators [5].
full autonomy stack: 全栈自动驾驶系统,包含perception, localization, planning, control, [5].
gauge: 测量, downstream task/experiment: 自动驾驶下游任务,常见的如motion planning和control, 例句:To gauge the usefulness of using our simulations to test motion planning, we conduct downstream experiments with two motion planners [5].
design ethos: 设计理念, 例子:The design ethos of DETR easily extend to more complex tasks [3].
ingredients: 要素, 例子:Two ingredients are essential for direct set predictions in detection [3].
reliance: 依赖,例子:We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks [6].
fashion: 风格,例子:We train the model on image classificaECCV2022tion in supervised fashion [6].
trump: 打败, 例子:We find that large scale training trumps inductive bias [6].
inductive biases: 归纳偏置,可以理解为对现实现象进行观察后进行一些总结,对模型进行约束,比如CNN是可以保持spatial relationship信息的,这是因为我们观察到图片每个pixel之间的位置关联很重要,所以这样设计了CNN。例句:Transformers lack some of the inductive biasesinherent(这个词也常出现,表示内在属性) to CNNs, such as translation equivariance and locality.
alternating: 交互的,经常用来表示某几种module在模型里循环交叉出现,例子:The Transformer encoder consists of alternating layers of multiheaded selfattention and MLP blocks [6].
appetite: 需求, 例子:This appetite for data has been successfully addressed in natural language processing (NLP) by self-supervised pretraining [7].
harness:驾驭,经常用于某种模型或方法没有被完全利用好,例如:However, its full power remains to be harnessed through the advent of new smart technologies, such as self-driving vehicles [8].
indispensable: 不可或缺的, 例如:Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving [9].
alleviate: 减轻,例如:To alleviate the local misalignment, we use a 2D-3D bounding box consistency constraint [10].
account for: 占据,例如:Consequently, errors in depth estimation account for the major part of the gap between pseudo-lidar and lidar-based detectors [10].
consensus: 共识, 例如:Hence, we propose a novel neural reasoning(这个词再次出现) framework that learns to communicate, to estimate potential errors, and finally, to reach a consensus about those errors [11].
surrogate: 代理,常用在adversarial attack中,例子:. In this setting, the attacker trains a surrogate model that imitates the target model [12].
degrade: 降低,损害,和performance联系紧密,常见的还有damage, decrease, drop. 例句:In addition, we also aim to minimize the intersection-over-union (IoU) of the bounding box proposals to further degrade performance by producing poorly localized objects [12].
hallucinating/hallucination: 原意是幻想,在CV里常用于amodal detection, 表示输入信息不直接含有某物体,但通过网络推理可以判断出物体在那。例句:We dub this problem amodal scene layout estimation, which involves hallucinating scene layout for even parts of the world that are occluded in the image [13].
defacto: 实际的:这个单词本身是拉丁语,但很多论文会用到,而且要用斜体。例句:We thus argue that MetaFormer is our de facto need for vision models that deserved more future research [14].
empirically 实操地,一般是想说用实验证明的意思,例句:we empirically demonstrate that the success of the transformer model is largely attributed to the MetaFormer architecture [14].
rudimentary 基本的,初级的。Currently, rudimentary resizing methods such as nearest neighbor, bilinear, and bicubic are among the top adopted image resizers visual recognition systems. [16]
off-the-shelf 现成的 The proposed resizer, therefore, can be an alternative to the offthe-shelf resizers to effectively reduce the expected drop in the recognition performance. [16]
常见CV句型:
表示xxx模型很流行:xxx have become the model of choice in some area.
解決某個問題:To ameliorate these issues
为了某个目的:Towards the goal of xxx
利用優點:leveraged the advantages of
表示转折:whereas
表示有些问题被解决,但还有一些被人们忽视:While much of the research into xxx(the certain area) have focused on (已解决的问题), an equally important yet underexplored problem is how to (还没有被解决的问题)。例句:While much of the research into microscopic traffic simulation have focused on modeling actors’ behaviors, an equally important yet underexplored problem is how to generate realistic snapshots of traffic scenes. (出处:SceneGen: Learning to Generate Realistic Traffic Scenes, CVPR)
表示实验证明某种想法的重要性:Our experiments emphasize/demonstrate the necessity/importance of xxx [3].
以xxx的形式:in the context of, 例子:In this paper, we consider the collaborative perception in the context of a heterogeneous multi-agent system [15].
表示某个元素起到关键作用:xxx play a critical role/ the success of something is attributed to xxx
References
[1] OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication (ICRA2022)
[2] V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer (ECCV2022)
[3] Asynchronous Multi-View SLAM (ICRA 2021)
[4] DETR: End-to-End Object Detection With Transformers (ECCV2020)
[5] Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction (ECCV2020)
[6] AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE (ICLR2021)
[7] Masked Autoencoders Are Scalable Vision Learners (CVPR2022)
[8] Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks (ICRA2019)
[9] Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving (ICLR2020)
[10] Is Pseudo-Lidar needed for Monocular 3D Object detection (ICCV2021)
[11] Learning to Communicate and correct pose error (CORL2020)
[12] Adversarial Attacks On Multi-Agent Communication (ICCV2021)
[13] MonoLayout: Amodal scene layout from a single image (WACV2020)
[14] MetaFormer Is Actually What You Need for Vision (CVPR2022)
[15] Learning Distilled Collaboration Graph for Multi-Agent Perception (NeurIPS 2021)
[16]Learning to Resize Images for Computer Vision Tasks (ICCV 2021)